April 2020 Archives

Perl Weekly Challenge 57: Tree Inversion and Shortest Unique Prefix

These are some answers to the Week 57 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in a couple of hours. This blog post offers some solutions to this challenge, please don’t read on if you intend to complete the challenge on your own.

Task 1: Tree Inversion

You are given a full binary tree of any height, similar to the one below:

    1
   /  \
  2    3
 / \  / \
4   5 6  7

Write a script to invert the tree, by mirroring the children of every node, from left to right. The expected output from the tree above would be:

    1
   /  \
  3    2
 / \  / \
7   6 5  4

The input can be any sensible machine-readable binary tree format of your choosing, and the output should be the same format.

Bonus: In addition to the above, you may wish to pretty-print your binary tree in a human readable text-based format as above.

I'll definitely take the bonus, because making auxiliary subroutines to represent graphically the trees is the best way to check that inversion subroutine works correctly (or to see the errors, if any). But I will not represent the tree edges (the / and \ connecting vertically the letters), because it becomes a bit difficult with 4 levels and more or less unmanageable (and quite ugly) when there are more that 4 levels. For example, I chose to represent a 5-level binary tree as follows:

                1
        2               3
    4       5       6       7
  8   9   a   b   c   d   e   f
 g h i j k l m n o p q r s t u v

I decided to implement two different subroutines for the display: one bft (breadth-first traversal) subroutine to construct an intermediate array of arrays in which each level of the tree is contained in one subarray:

[[1] [2 3] [4 5 6 7] [8 9 a b c d e f] [g h i j k l m n o p q r s t u v]]

and one display subroutine to produce the graphical ASCII representation. The reason for doing that is that the display subroutine can thus be reused, independently of the internal tree representation.

Tree Inversion In Perl

I have discussed in my blog post of last week 3 different ways to represent a binary tree: hash of hashes, array of arrays and a simple flat array, but presented only the array of arrays solution in Perl. This week, I'll use an array of arrays and a flat array.

Tree Inversion Using an Array of Arrays

For each node, the first array item is the current value, the second item the left child and the third item the right child. For example, the binary tree shown in the task description could be initialized as follows:

my $tree = [1, [2, [4], [5]], [3, [6], [7]]];

In this implementation, the bft and invert subroutines are both recursive to perform a depth-first traversal of the binary tree.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;
use Data::Dumper;

my $tree = [1, [2, [4, [8], [9]], [5, ['a'], ['b']]], 
    [3, [6, ['c'], ['d']], [7, ['e'], ['f']]]];

sub invert {
    my $node = shift;
    invert($node->[1]) if defined $node->[1];
    invert($node->[2]) if defined $node->[2];
    ($node->[1], $node->[2]) = ($node->[2], $node->[1])
        if defined $node->[1] and defined $node->[2]
}
my @bft_tree;
bft ($tree, 0);
say "Tree before inversion:";
display(\@bft_tree);
invert($tree);
@bft_tree = ();
bft ($tree, 0);
say "\nTree after inversion:";
display(\@bft_tree);

sub bft {          # Breadth First Traversal
    my ($node, $level) = @_;   
    push @{$bft_tree[$level]}, $node->[0];
    bft($node->[1], $level + 1) if defined $node->[1];
    bft($node->[2], $level + 1) if defined $node->[2];
}
sub display {
    my $bft_tree = shift;
    my $start = scalar @{$bft_tree->[-1]};
    my $sep_val = (2 * $start) - 1;
    for my $line (@$bft_tree) {
        my $sep = " " x $sep_val;
        say " " x $start, join $sep, @$line;
        $start /= 2;
        $sep_val = ($sep_val - 1) / 2;
    }
}

This is the output displayed by this program:

$ perl invert_tree.pl
Tree before inversion:
        1
    2       3
  4   5   6   7
 8 9 a b c d e f

Tree after inversion:
        1
    3       2
  7   6   5   4
 f e d c b a 9 8

Tree Inversion Using a Flat Array

Binary trees can also be stored in breadth-first order as an array with an implicit data structure. This is similar to what is commonly done for binary heaps (i.e. a binary tree that keeps a partial order). Here, we're not interested with partial order, but the idea is to use an array with the following properties. The item with subscript 0 is the value of the root node. The index of an element is used to compute the index of its parent and the indices of its children. The basic idea is that, for any node, the index of its parent is about half the index of the current node, and, conversely, the indices of the children are about twice the index of the current node. More precisely, for a tree starting at index 0, the exact formulas for a node with index $n are commonly as follows:

  • parent: int( ($n-1)/2 )
  • left child: 2*$n + 1
  • right child: 2*$n + 2

The root node is at index 0, and its children are at positions 1 and 2. The children of item with index 1 are at positions 3 and 4 and the children of 2 are at positions 5 and 6.

These rules may seem a bit complicated (and it is a bit tedious to compute these things manually), but they're in fact quite easy to implement in a program:

sub children { my $i = shift; 2*$i + 1, 2*$i + 2 }
sub parent { my $i = shift; ($i-1) / 2; }

The parent subroutine is provided here for the purpose of completeness, it is not needed in our program.

Note that it is very easy to populate the binary-heap-like array from a graphical representation: you just need to perform a breadth-first traversal (and provide empty slots for missing nodes, but that's not necessary here, since we are only dealing with full binary trees). For example, this binary tree:

    1
   /  \
  2    3
 / \  / \
4   5 6  7

can be encoded as:

my $tree = [1 , 2, 3, 4, 5, 6, 7];

or even:

my $tree = [1 .. 7];

Now, the invert subroutine becomes very simple, since we can use the bft to get an array of arrays by level, reverse the components and flatten the overall structure:

sub invert {  
    my $bft_tree = bft(shift);
    return [ map {reverse @$_} @$bft_tree ];
}

or even with a single code line:

sub invert {  
    return [ map { reverse @$_ } @{bft(shift)} ];
}

The bft subroutine could be a recursive subroutine as before:

sub bft2 {
    my ($index, $level) = @_;   
    push @{$bft_tree[$level]}, $tree->[$index];
    my ($left, $right) = children $index;
    bft($left, $level + 1) if defined $tree->[$left];
    bft($right, $level + 1) if defined $tree->[$right];
}

but I find it simpler to use a while loop to traverse the tree:

sub bft {               # Breadth First Traversal
    my $tree = shift;
    my ($index, $level) = (0, 0);
    my @bft_tree;
    while ($index < scalar @$tree) {
        my $new_index = $index + 2 ** $level - 1;
        push @{$bft_tree[$level++]}, @{$tree}[$index .. $new_index];
        $index = $new_index + 1;
    }
    return \@bft_tree;
}

This is the final code for the whole program:

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;
use Data::Dumper;

sub children { my $i = shift; 2*$i+1, 2*$i+2 }
sub parent { my $i = shift; ($i-1)/2; }  # not needed here

sub display {
    my $bft_tree = bft(shift);
    my $start = scalar @{$bft_tree->[-1]};
    my $sep_val = (2 * $start) - 1;
    for my $line (@$bft_tree) {
        my $sep = " " x $sep_val;
        say " " x $start, join $sep, @$line;
        $start /= 2;
        $sep_val = ($sep_val - 1) / 2;
    }
}
sub bft {               # Breadth First Traversal
    my $tree = shift;
    my ($index, $level) = (0, 0);
    my @bft_tree;
    while ($index < scalar @$tree) {
        my $new_index = $index + 2 ** $level - 1;
        push @{$bft_tree[$level++]}, @{$tree}[$index .. $new_index];
        $index = $new_index + 1;
    }
    return \@bft_tree;
}
sub invert {  
    return [ map { reverse @$_ } @{bft(shift)} ];
}

my $tree = [ 1..9, 'a'..'v' ];
say "\nTree before inversion";
display $tree;
my $inverted_tree = invert($tree);
say "\nInverted tree";
display $inverted_tree;

This program produces the following output:

$ perl invert_tree2.pl
Tree before inversion
                1
        2               3
    4       5       6       7
  8   9   a   b   c   d   e   f
 g h i j k l m n o p q r s t u v

Inverted tree
                1
        3               2
    7       6       5       4
  f   e   d   c   b   a   9   8
 v u t s r q p o n m l k j i h g

Tree Inversion In Raku

Using a Flat Array

We'll start with a flat array. Please refer to the Tree Inversion Using a Flat Array section just above for explanations about the use of flat arrays to store binary trees. The Raku code below is essentially a port of the Perl code:

use v6;

sub children (Int $i) { 2*$i+1, 2*$i+2 }
sub parent (Int $i) { ($i-1)/2; }  # not needed here

sub display ($tree) {
    my @bft_tree = bft($tree);
    my $start = (@bft_tree[*-1]).elems;
    my $sep_val = (2 * $start) - 1;
    for @bft_tree -> @line {
        my $sep = " " x $sep_val;
        say " " x $start, join $sep, @line;
        $start /= 2;
        $sep_val = ($sep_val - 1) / 2;
    }
}
sub bft ($tree) {               # Breadth First Traversal
    my ($index, $level) = (0, 0);
    my @bft_tree;
    while ($index <= $tree.end) {
        my $new_index = $index + 2 ** $level - 1;
        (@bft_tree[$level++]).append($tree[$index .. $new_index]);
        $index = $new_index + 1;
    }
    return @bft_tree;
}
sub invert ($tree) {  
    return [ map { | reverse @$_ }, bft($tree) ];
}

my $tree = (1..9, 'a'..'v').flat;
say $tree;
say "\nTree before inversion";
display $tree;
my $inverted_tree = invert($tree);
say "\nInverted tree";
say "$inverted_tree\n";
display $inverted_tree;

Running the program displays more or less the same output as before:

$ perl6 invert_tree2.p6
(1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s t u v)

Tree before inversion:
                1
        2               3
    4       5       6       7
  8   9   a   b   c   d   e   f
 g h i j k l m n o p q r s t u v

Inverted tree:
1 3 2 7 6 5 4 f e d c b a 9 8 v u t s r q p o n m l k j i h g

                1
        3               2
    7       6       5       4
  f   e   d   c   b   a   9   8
 v u t s r q p o n m l k j i h g

Using a Hash of Hashes

A hash of hashes is probably the most explicit and clearest implementation of a binary tree. But it tends to be quite verbose.

A node is a hash consisting in three elements: its value (an integer), its left child and its right child. The children may be undefined when we are at the lowest level of the tree (i.e. when the node is a "leaf"). So a node could be implemented as a hash with three keys, v (value), l (left child) and r (right child). The children, when they are defined, are themselves nodes, so the structure is nested and can be explored recursively. For example, the following binary tree:

    1
   /  \
  2    3
 / \  / \
4   5 6  7

can be encoded as:

my %tree =  v => 1, 
            l => { v => 2, l => {v => 4}, r => {v => 5} },
            r => { v => 3, l => {v => 6}, r => {v => 7} },
            ;

In this quick and simple implementation, we use global variables for the tree and for the breadth-first array, to avoid the pain of carrying them around back and forth in the successive recursive subroutine calls. In a real-life application, it would be more proper to pass them as arguments and return values of subroutines, or to use dynamic variables.

use v6;

my %tree =  v => 1, 
            l => { v => 2, l => {v => 4}, r => {v => 5} },
            r => { v => 3, l => {v => 6}, r => {v => 7} },
            ;
my @bft-tree;

sub display ($tree) {
    my $start = (@bft-tree[*-1]).elems;
    my $sep_val = (2 * $start) - 1;
    for @bft-tree -> @line {
        my $sep = " " x $sep_val;
        say " " x $start, join $sep, @line;
        $start /= 2;
        $sep_val = ($sep_val - 1) / 2;
    }
}
sub bft (%node, $level) {
    push @bft-tree[$level], %node<v>;
    bft(%node<l>, $level + 1) if defined %node<l>;
    bft(%node<r>, $level + 1) if defined %node<r>;
}
sub invert (%node) {
    invert(%node<l>) if defined %node<l>;
    invert(%node<r>) if defined %node<r>;
    (%node<l>, %node<r>) = %node<r>, %node<l>
        if defined %node<l> and defined %node<r>;
}
bft %tree, 0;
say "Tree before inversion:";
display(@bft-tree);
invert(%tree);
@bft-tree = ();
bft %tree, 0;
say "\nTree after inversion";
display(@bft-tree);

This program produces the following output:

$ ./perl6 invert_tree3.p6
Tree before inversion:
    1
  2   3
 4 5 6 7

Tree after inversion
    1
  3   2
 7 6 5 4

Task 2: Shortest Unique Prefix

Write a script to find the shortest unique prefix for each each word in the given list. The prefixes will not necessarily be of the same length.

Sample Input:

[ "alphabet", "book", "carpet", "cadmium", "cadeau", "alpine" ]

Expected Output:

[ "alph", "b", "car", "cadm", "cade", "alpi" ]

For solving this task, we'll need to examine every combination of two elements of the input list, which means essentially two nested loops. And checking the letters to obtain a unique prefix requires a third nested loop. To alleviate the combinational explosion, we'll start by storing each word in a hash of arrays in accordance with the word's initial letter, so that we can then compare only words with the same initial letter.

Shortest Unique Prefix in Perl

This is my Perl implementation:

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;
use Data::Dumper;

my @words = qw /alphabet book carpet cadmium cadeau alpine/;
my @prefixes;
my %letters;
for my $word (@words) {
    push @{$letters{ substr $word, 0, 1 }}, $word;
}    
for my $letter (sort keys %letters) {
    push @prefixes, $letter and next if @{$letters{$letter}} == 1;
    my $candidate;
    for my $word1 ( @{$letters{$letter}} ) {
        my $prefix_length = 1;
        for my $word2 (@{$letters{$letter}} ) {
            next if $word1 eq $word2;
            my $i = 1;
            $i++ while substr($word1, $i, 1) eq substr($word2, $i, 1);
            if ($i > $prefix_length) {
                $candidate = substr($word1, 0, $i + 1);
                $prefix_length = $i
            }
        }
    push @prefixes, $candidate;
    }
}
say "@prefixes";

The output is correct:

$ perl prefix.pl
alph alpi b car cadm cade

Shortest Unique Prefix in Raku

Since I'm not entirely satisfied with my Perl implementation, which is a bit too complicated in my view (but was too lazy to change it), I won't port my Perl program to Raku this time, but will try a different approach. This is my Raku implementation:

use v6;

my @words = <alphabet foo book carpet cadmium cadeau alpine foxtrot>;
my @prefixes;
my %letters;
%letters.push(substr($_, 0, 1) =>  $_) for @words;
for %letters.keys.sort -> $let {
    push @prefixes, $let and next if %letters{$let}.elems == 1;
    my $candidate;
    for %letters{$let}.flat -> $word {
        for 2..$word.chars -> $i {
            my $cand = substr $word, 0, $i;
            my $count = %letters{$let}.grep({$cand eq substr($_, 0, $i)}).elems; 
            push @prefixes, $cand and last if $count == 1;
        }
    }
}
say @prefixes;

And this is the output:

$ ./perl6 prefix.p6
[alph alpi b car cadm cade foo fox]

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, May 3, 2020. And, please, also spread the word about the Perl Weekly Challenge if you can.

Perl Weekly Challenge # 56: Diff-k and Path Sum

These are some answers to the Week 56 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in a couple of days (April 19, 2020). This blog post offers some solutions to this challenge, please don’t read on if you intend to complete the challenge on your own.

Task 1: Diff-k

You are given an array @N of positive integers (sorted) and another non negative integer k.

Write a script to find if there exists 2 indices i and j such that A[i] - A[j] = k and i != j.

It should print the pairs of indices, if any such pairs exist.

Example:

@N = (2, 7, 9)
$k = 2

Output : 2,1

Since the array items are not necessarily adjacent and we have to print all the matching pairs, I do not see any other way than basically trying all pairs. Well, since the array is sorted, we don’t really need to test all possible pairs, but only all combinations of 2 elements of the input array.

Diff-k in Perl

There are some CPAN modules to generate combinations, but, as usual, I consider that it would somewhat cheating to use a ready-made solution. So, I’ll do it “the hard way” and manually generate the combinations. This is quite simple. The program uses two nested loops to iterate over the array, and prints out the pairs for which the difference is the target. The target difference and the array are passed as two arguments to the program. If no argument is passed, then the program uses some default values.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

sub find_diff {
    my ($diff, @array) = @_;
    for my $i (0..$#array - 1) {
        for my $j ($i +1.. $#array) {
            say "Indices $j and $i (values: $array[$j], $array[$i])" 
                if $array[$j] - $array[$i] == $diff;
        }
    }
}
my $k = shift // 2;
my @N = @ARGV;
@N = (2, 7, 9) unless @N;
find_diff $k, @N;

Here are some sample runs:

$ perl find_diff.pl
Indices 2 and 1 (values: 9, 7)

$ perl find_diff.pl 2 4 5 7 9 11 15 17
Indices 2 and 1 (values: 7, 5)
Indices 3 and 2 (values: 9, 7)
Indices 4 and 3 (values: 11, 9)
Indices 6 and 5 (values: 17, 15)

$ perl find_diff.pl 4 4 5 7 9 11 15 17
Indices 3 and 1 (values: 9, 5)
Indices 4 and 2 (values: 11, 7)
Indices 5 and 4 (values: 15, 11)

Diff-k in Raku

Raku has a built-in method, combinations, which can generate all combinations of two (or any other number, or even a range of numbers) items from an input list. So, our program will just generate all combinations of indices and print out those matching the criteria:

use v6;

sub find-diff ($diff,  @array) {
    for (0..@array.end).combinations: 2 -> ($i, $j) {
        say "Indices $j and $i (values: @array[$j], @array[$i])" 
            if @array[$j] - @array[$i] == $diff;
    }
}
my ($k, @N);
if @*ARGS.elems > 2 {
    ($k, @N) = @*ARGS;
} else {
    $k = 2;
    @N = 2, 7, 9;
}
find-diff $k, @N;

The program uses arguments passed to it (or default values if there isn’t enough arguments).

Here are some sample runs:

$ perl6 find_diff.p6
Indices 2 and 1 (values: 9, 7)

$ perl6 find_diff.p6  2 4 5 7 9 11 15 17
Indices 2 and 1 (values: 7, 5)
Indices 3 and 2 (values: 9, 7)
Indices 4 and 3 (values: 11, 9)
Indices 6 and 5 (values: 17, 15)

$ perl6 find_diff.p6  4 4 5 7 9 11 15 17
Indices 3 and 1 (values: 9, 5)
Indices 4 and 2 (values: 11, 7)
Indices 5 and 4 (values: 15, 11)

Task 2: Path Sum

You are given a binary tree and a sum, write a script to find if the tree has a path such that adding up all the values along the path equals the given sum. Only complete paths (from root to leaf node) may be considered for a sum.

Example: given the below binary tree and sum = 22,

      5
     / \
    4   8
   /   / \
  11  13  9
 /  \      \
7    2      1

For the given binary tree, the partial path sum 5 → 8 → 9 = 22 is not valid.

The script should return the path 5 → 4 → 11 → 2 whose sum is 22.

So basically we have to implement a depth-first tree traversal algorithm. Once this is done, finding the paths matching the criteria is quite easy.

The first question to be answered is: how do we represent a binary tree? There are a number of possibilities. We’ll just present three.

The most obvious way might be a nested hash of hashes. Each node by a hash with three items: the current node value, a reference to the left child and a reference to the right child. For example, the top of the binary tree shown above could look like this: { val => 5, left => {val => 4, left => { val => 11}}, right => { val => 8, left => { val => 13}, right { val => 9 }}}. Or, in a more graphical way:

{ val => 5, 
  left => {
    val => 4, 
    left => { 
        val => 11
    }
  }, 
  right => { 
    val => 8, 
    left => { 
        val => 13
        }, 
        right { 
            val => 9 
        }
    }
}

But that’s quite verbose, I don’t like doing so much typing. A more concise way would to use a nested array of arrays. For each node, the first array item is the current value, the second item the left child and the third item the right child. The top of the tree shown above might look like this: [5, [4, [11]], [8, [13], ]]. Or, more graphically:

[
    5, 
    [
        4, [11]
       ], 
    [
        8, [13] 
    ]
]

We could even use a simple flat array in a way similar to what is commonly done for binary heaps (i.e. a binary tree that keeps a partial order). Here we’re not interested with partial order, but the idea is to use an array with the following properties. The item with subscript 0 is the value of the root node. The index of an element is used to compute the index of its parent and the indices of its children. The basic idea is that, for any node, the index of its parent is about half the index of the current node, and, conversely, the indices of the children are about twice the index of the current node. More precisely, for a tree starting at index 0, the exact formulas for a node with index $n are commonly as follows:

  • parent: int( ($n-1)/2 )
  • left child: 2*$n + 1
  • right child: 2*$n + 2

The root node is at index 0, and its children are at positions 1 and 2. The children of item with index 1 are at positions 3 and 4 and the children of 2 are at positions 5 and 6.

These rules may seem a bit complicated (and it is a bit tedious to compute these things manually), but they’re in fact quite easy to implement and the binary tree:

      5
     / \
    4   8
   /   / \
  11  13  9

would be represented by this simple array:

[5, 4, 8, 11, , 13, 9]

We will implement such a data structure in the Raku solutions below.

Note that it is very easy to populate the binary-heap-like array from a graphical representation: you just need to perform a breadth-first traversal and provide empty slots for missing nodes.

Path Sum in Perl

We’ll use a nested array of arrays to represent the binary tree. We implement a recursive dfs (for depth-first search) subroutine to traverse the various paths of the tree. At each call of the subroutine, we keep track of the current sum and of the current path. When we reach a leaf (no more child), we print the path if the current sum is equal to the target value.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

my $tree = [5, [4, [11, [7], [2]]], [8, [13], [9, [1]]]] ;

sub dfs {
    my ($node, $target, $sum, $path) = @_;
    my $new_sum = $sum + $node->[0];
    unless (exists $node->[1] or exists $node->[2]) {
        say $new_sum, " -> @$path $node->[0]" if $new_sum == $target;
    }
    dfs($node->[1], $target, $new_sum, [@$path, $node->[0]]) 
        if defined $node->[1];
    dfs($node->[2], $target, $new_sum, [@$path, $node->[0]]) 
        if defined $node->[2];
}

my $target = shift // 22;
dfs($tree, $target, 0, []);

The default target is 22, but we can pass another value to the program.

Here are a few runs:

$ perl bin_tree_sum.pl
22 -> 5 4 11 2

$ perl  bin_tree_sum.pl 23
23 -> 5 8 9 1

$ perl  bin_tree_sum.pl 22
22 -> 5 4 11 2

$ perl  bin_tree_sum.pl 27
27 -> 5 4 11 7

$ perl  bin_tree_sum.pl 26
26 -> 5 8 13

Path Sum in Raku

We’ll implement two solutions for the tree.

Implementing the Tree as a Nested Array of Arrays

This is a port to Raku of our Perl program above:

use v6;

my @tree = [5, [4, [11, [7], [2]]], [8, [13], [9, [1]]]] ;

sub dfs (@node, $target, $sum, @path) {
    my $new-sum = $sum + @node[0];
    unless @node[1]:exists or @node[2]:exists {
        say $new-sum, " -> @path[] @node[0]" if $new-sum == $target;
    }
    dfs(@node[1], $target, $new-sum, (@path, @node[0]).flat) 
        if defined @node[1];
    dfs(@node[2], $target, $new-sum, (@path, @node[0]).flat) 
        if defined @node[2];
}

my $target = @*ARGS.elems == 1 ?? @*ARGS[0] !! 22;
dfs(@tree, $target, 0, []);

Here are a few runs:

$ perl6  bin_tree_sum.p6
22 -> 5 4 11 2

$ perl6  bin_tree_sum.p6 22
22 -> 5 4 11 2

$ perl6  bin_tree_sum.p6 24

$ perl6  bin_tree_sum.p6 26
26 -> 5 8 13

$ perl6  bin_tree_sum.p6 23
23 -> 5 8 9 1

Implementing the Tree as a Flat Array (Binary-Heap-like)

As explained above, we can use a flat array to represent a binary tree, with the following rules: the indices of the children of a node with index $n are as follows:

  • left child: 2*$n + 1
  • right child: 2*$n + 2

In Raku, it isn’t possible to just leave an “empty slot” when defining an array. We need to provide undefined values, such as, for example, Nil, Any, or Int. We’ll use Int since it is the most consistent option with a tree made of integers.

The code isn’t much more complicated than before:

use v6;

my @tree = [5, 4, 8, 11, Int, 13, 9, 7, 2, Int, Int, Int, Int, 1];

sub dfs ($index, $target, $sum, @path) {
    sub children ($i) { 2*$i+1, 2*$i+2 }
    my $cur-val = @tree[$index];
    my $new-sum = $sum + $cur-val;
    my ($left, $right) = children $index; 
    unless defined @tree[$left] or defined @tree[$right] {
        say $new-sum, " -> @path[] $cur-val" if $new-sum == $target;
    }
    dfs($left, $target, $new-sum, (@path, $cur-val).flat) 
        if defined @tree[$left];
    dfs($right, $target, $new-sum, (@path, $cur-val).flat) 
        if defined @tree[$right];
}

my $target = @*ARGS.elems == 1 ?? @*ARGS[0] !! 22;
my $root-node = 0;
dfs($root-node, $target, 0, []);

Here are a few runs:

$ perl6 bin_tree_sum2.p6
22 -> 5 4 11 2

$ perl6 bin_tree_sum2.p6 22
22 -> 5 4 11 2

$ perl6 bin_tree_sum2.p6 23
23 -> 5 8 9 1

$ perl6 bin_tree_sum2.p6 24

$ perl6 bin_tree_sum2.p6 26
26 -> 5 8 13

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, April 26, 2020. And, please, also spread the word about the Perl Weekly Challenge if you can.

Revisiting the Collatz Sequence (PWC 54)

In my blog post related to Perl Weekly Challenge 54 posted on April 4, 2020, the section about the "extra credit" task concerning the Collatz conjecture described in some details the difficulties encountered when trying to cache the data: the volume of data is very large. I'm blogging again on the subject because of new findings.

The Collatz conjecture concerns a sequence defined as follows: start with any positive integer n. Then each term is obtained from the previous term as follows: if the previous term is even, the next term is one half of the previous term. If the previous term is odd, the next term is 3 times the previous term plus 1. For example, the Collatz sequence for 23 is this:

23 70 35 106 53 160 80 40 20 10 5 16 8 4 2 1

The conjecture is that, no matter what input value of n, the sequence will always reach 1. It is usually believed to be true (and no counter-example has ever been found), but, despite a lot of efforts, nobody has been able to prove it, and this is deemed to be a very difficult problem.

Computing the Collatz sequence of a given number is fairly easy and can be done in a simple one-liner:

$ perl -E '$n = shift; print "$n "; while ($n != 1) { $n = $n % 2 ? 3 * $n + 1 : $n / 2; print "$n "} ' 26
26 13 40 20 10 5 16 8 4 2 1

The extra-credit task was to calculate the sequence length for all starting numbers up to 1000000 (1e6), and output the starting number and sequence length for the longest 20 sequences.

The Original Solution

In theory, it wouldn't be very complicated to encapsulate the above code into a loop to compute the Collatz sequence for any range of numbers. Except that going all the way up to 1,000,000 is probably going to take a very long time. One of the reason is that we are going to recompute Collatz sequence successors for the same number again and again many times. If you look at the two above examples, the sequences both end up with the following series: 40 20 10 5 16 8 4 2 1. So, it might be useful, when we reach 40 for the first time, to compute the end of the sequence only once, and to store it in a hash of arrays (or possibly an array of arrays), in order to retrieve it straight from the hash when we reach 40 once more. And, of course, we can reuse the end of the sequence when computing the Collatz sequence for 40, 80, 160, as well as 52, 104, etc. Such a strategy is called caching or memoizing: storing in memory the result of a computation that we’re likely to have to compute again. It is sometimes described as “trading memory for time.”

Since we want to compute the Collatz sequence for all integers up to 1,000,000, the cache will grow very large (several millions of sequences) and we might run out of memory. In the first version of the program below, I tried to store all sequences up to one million, and the program turned out to be painfully slow. Looking at the system statistics, I found that, after a while, available memory became exhausted and the system would swap memory on the disk, leading to very slow execution. I made a couple of tests, and found that I could store the sequences for all numbers up to about 300,000 without exceeding the available memory of my computer (that number might be different on your computer), thus preventing the process from swapping and getting more or less the optimal performance, hence the MAX constant set to 300,000. Since I knew from earlier tests that the 20 longest sequences would all have more than 400 items, I also hard-coded a lower limit of 400 items for the sequences whose length had to be recorded. Another possibly better solution might have been to maintain a sliding array of the top 20 sequences, but I feared that maintaining this array many times over the execution of the program would end up impairing performance.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;
use Data::Dumper;
use constant MAX => 300000;

my %cache;

sub collatz_seq {
    my $input = shift;
    my $n = $input;
    my @result;
    while ($n != 1) {
        if (exists $cache{$n}) {
            push @result, @{$cache{$n}};
            last;
        } else {
            my $new_n = $n % 2 ? 3 * $n + 1 : $n / 2;
            push @result, $new_n;
            $cache{$n} = [$new_n, @{$cache{$new_n}}] 
                if defined ($cache{$new_n}) and $n < MAX;
            $n = $new_n;
        }
    }
    $cache{$input} = [@result] if $n < MAX;
    return @result;
}

my @long_seqs;
for my $num (1..1000000) {
    my @seq = ($num, collatz_seq $num);
    push @long_seqs, [ $num, scalar @seq] if scalar @seq > 400;
}

@long_seqs = sort { $b->[1] <=> $a->[1]} @long_seqs;
say  "$_->[0]: $_->[1]" for @long_seqs[0..19];

With these optimizations, I was able to reduce execution time to 1 min 7 sec.:

$ time perl collatz.pl
837799: 525
626331: 509
939497: 507
704623: 504
910107: 476
927003: 476
511935: 470
767903: 468
796095: 468
970599: 458
546681: 452
818943: 450
820022: 450
820023: 450
410011: 449
615017: 447
886953: 445
906175: 445
922524: 445
922525: 445

real    1m7,469s
user    1m6,015s
sys     0m1,390s

Changing the Caching Strategy

A couple of days after I submitted my solution to the Perl Weekly Challenge and posted my blog post mentioned above, I figured out that my caching strategy was in fact quite inefficient: the program doesn't need to cache the full sequence, it would be enough to just store the number of its items. And that reduces considerably the memory footprint and other overhead of the cache.

I originally did not try this change in Perl (and did not intend to do it), but I did it with the Raku solution. Changing the caching strategy made the Raku program 6 times faster (see below).

On April 5, 2020 (one day after my original blog post), 1nick published a very interesting message on the Perl Monks forum in which he presented another strategy: parallelizing the process using MCE::Map Each worker is handed only the beginning and end of the chunk of the sequence it will process, and workers communicate amongst themselves to keep track of the overall task. With this change (and no caching), this program ran 5 times faster, on a 12-core machine (the full program is presented in Nick's post). Following that initial post, an extremely interesting discussion emerged between Nick and several other Perl monks. I really cannot summarize this discussion here, just follow the link if you're interested (it's really worth the effort). Note that I saw this Perl Monks thread of discussion only on April 14.

Given that discussion on the Perl Monks forum, I felt compelled to implement the modified caching strategy (caching the sequence lengths rather than the sequences themselves) in the Perl version.

The computer on which I ran the next test is slower than the one where I ran those above. These are the timings of my original program for this computer:

real    1m37,551s
user    1m9,375s
sys     0m21,031s

This is now my first implementation with the new caching strategy:

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;
use constant MAX => 1000000;

my %cache = (2 => 2);

sub collatz_seq {
    my $input = shift;
    my $n = $input;
    my $result = 0;
    while ($n != 1) {
        if (exists $cache{$n}) {
            $result += $cache{$n};
            last;
        } else {
            my $new_n = $n % 2 ? 3 * $n + 1 : $n / 2;
            $result++;
            $cache{$n} = $cache{$new_n} + 1 
                if defined $cache{$new_n} and $n < MAX;
            $n = $new_n;
        }
    }
    $cache{$input} = $result if $input < MAX;
    return $result;
}

my @long_seqs;
for my $num (1..1000000) {
    my $seq_length = collatz_seq $num;
    push @long_seqs, [ $num, $seq_length ] if $seq_length > 400;
}

@long_seqs = sort { $b->[1] <=> $a->[1]} @long_seqs;
say  "$_->[0]: $_->[1]" for @long_seqs[0..19];

This program produces the same outcome, but is nearly 3 times faster:

real    0m34,207s
user    0m34,108s
sys     0m0,124s

It's pretty good, but still not as good as Nick's parallelized solution (which ran 5 times faster).

Using an Array Instead of a Hash

But we now end up with a cache having essentially one entry per input number in the 1..1000000 range. So, I thought, perhaps it might be better to use an array, rather than a hash, for the cache (accessing an array item should be faster than a hash lookup).

This is the code for this new implementation:

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;
use constant MAX => 1000000;

my @cache = (0, 1, 2);

sub collatz_seq {
    my $input = shift;
    my $n = $input;
    my $result = 0;
    while ($n != 1) {
        if (defined $cache[$n]) {
            $result += $cache[$n];
            last;
        } else {
            my $new_n = $n % 2 ? 3 * $n + 1 : $n / 2;
            $result++;
            $cache[$n] = $cache[$new_n] + 1 
                if defined $cache[$new_n] and $n < MAX;
            $n = $new_n;
        }
    }
    $cache[$input] = $result if $input < MAX;
    return $result;
}

my @long_seqs;
for my $num (1..1000000) {
    my $seq_length = collatz_seq $num;
    push @long_seqs, [ $num, $seq_length ] if $seq_length > 400;
}

@long_seqs = sort { $b->[1] <=> $a->[1]} @long_seqs;
say  "$_->[0]: $_->[1]" for @long_seqs[0..19];

With this new implementation, we still obtain the same result, but the program is now more than 55 times faster than my original one (and almost 20 times faster than the solution using a hash for the cache):

$ time perl collatz3.pl
837799: 525
626331: 509
[Lines omitted for brevity]
922524: 445
922525: 445

real    0m1,755s
user    0m1,687s
sys     0m0,061s

I strongly suspected that using an array would be faster, but I frankly did not expect such a huge gain until I tested it.

So, it is true that throwing more CPU cores at the problem makes the solution significantly faster (although to a limited extent with my computer that has only 4 cores). But using a better algorithm can often be a better solution. The best, of course, would be to do both, and this can be done, as we will see below.

Further Optimizations

After I presented those results on the Perl Monks forum, another Perl monk, Mario Roy, the person who wrote the MCE::Map used by Nick and a number of other very useful Perl modules for parallel processing, suggested three further optimizations:

1. Replaced division by 2.

$n >> 1;

2. Removed one level of branching.

while ($n != 1) {
    $result += $cache[$n], last
        if defined $cache[$n];

    my $new_n = $n % 2 ? 3 * $n + 1 : $n >> 1;
    $result++;
    $cache[$n] = $cache[$new_n] + 1
        if defined $cache[$new_n] and $n < $max;

    $n = $new_n;
}

3. Lastly, reduced the number of loop iterations.

while ($n != 1) {
    $result += $cache[$n], last
        if defined $cache[$n];

    $n % 2 ? ( $result += 2, $new_n = (3 * $n + 1) >> 1 )
           : ( $result += 1, $new_n = $n >> 1 );

    $cache[$n] = $cache[$new_n] + ($n % 2 ? 2 : 1)
        if defined $cache[$new_n] and $n < $max;

    $n = $new_n;
}

On his computer and with a larger range (up to 1e7 instead of 1e6), he obtained the following timings:

collatz3_a.pl 1e7  13.130s  (a) original
collatz3_b.pl 1e7  12.394s  (b) a + replaced division with >> 1
collatz3_c.pl 1e7  12.261s  (c) b + removed 1 level of branching
collatz3_d.pl 1e7   9.170s  (d) c + reduced loop iterations

So, that's about 30% faster. Interesting, I would not have thought such micro-optimizations would provide such a significant gain. I’ll have to remember that. But that was just the first step of Mario’s approach, the really good things come now.

Combining Caching and Parallel Execution

In another Perl Monks post, Mario Roy showed how to combine caching with parallel execution using the File::Map module that implements mapped memory, which can be shared between threads or forked processes. With a 32-core CPU, Mario was able to reduce the execution duration to less than 0.7 second! Wow! Please follow the link for the details.

So, yes, it is possible to combine caching with parallel execution.

New Caching Strategy in Raku

As mentioned earlier, when the idea came to me to store the sequence lengths rather than the sequences themselves, I originally tried to implement it in Raku. I'll cover that in detail in my review of the Raku solutions, but let me provide here a summary.

Remember that the original solution took about 9 minutes with Raku.

This is the first implementation (using sequence length in a hash):

use v6;

my %cache = 2 => 2;

sub collatz-seq (UInt $in) {
    my $length = 0;
    my $n = $in;
    while $n != 1 {
        if %cache{$n} :exists {
            $length += %cache{$n};
            last;
        } else {
            my $new_n = $n % 2 ?? 3 * $n + 1 !! $n / 2;
            $length++;
            %cache{$n} = %cache{$new_n} + 1 
                if defined (%cache{$new_n}) and $new_n <= 2000000;
            $n = $new_n.Int;
        }
    }
    %cache{$in} = $length if $in <= 2000000;
    return $length;
}

my @long-seqs;
for 1..1000000 -> $num {
    my $seq-length = collatz-seq $num;
    push @long-seqs, [ $num, $seq-length] if $seq-length > 400;
}
@long-seqs = sort { $^b[1] <=> $^a[1]}, @long-seqs;
say  "$_[0]: $_[1]" for @long-seqs[0..19];

This new program displays the same output as the previous one, but runs about 6 times faster:

$ time perl6 collatz2.p6
837799: 525
626331: 509
939497: 507
[Lines omitted for brevity]
906175: 445
922524: 445
922525: 445

real    1m31,660s
user    0m0,000s
sys     0m0,062s

This is the code for the implementation using an array instead of a hash for the cache:

use v6;

my @cache = 0, 1, 2;

sub collatz-seq (UInt $in) {
    my $length = 0;
    my $n = $in;
    while $n != 1 {
        if defined @cache[$n] {
            $length += @cache[$n];
            last;
        } else {
            my $new_n = $n % 2 ?? 3 * $n + 1 !! $n / 2;
            $length++;
            @cache[$n] = @cache[$new_n] + 1 
                if defined @cache[$new_n] and $new_n <= 2000000;
            $n = $new_n.Int;
        }
    }
    @cache[$in] = $length;
    return $length;
}

my @long-seqs;
for 2..1000000 -> $num {
    my $seq-length = collatz-seq $num;
    push @long-seqs, [ $num, $seq-length] if $seq-length > 200;
}
@long-seqs = sort { $^b[1] <=> $^a[1]}, @long-seqs;
say  "$_[0]: $_[1]" for @long-seqs[0..19];

And the new program runs about twice faster than with a hash (and 12 times faster than the original code):

$ time ./perl6 collatz3.p6
837799: 525
626331: 509
[Lines omitted for brevity]
906175: 445
922524: 445
922525: 445

real    0m45,735s
user    0m0,015s
sys     0m0,046s

Interestingly, the Perl program runs 3 times faster after the first optimization, and 55 times faster after the second optimization, where as the Raku program runs 6 times faster after the first optimization and 12 times faster after the second one. It is not necessarily surprising that some optimizations work better with one language and others with another language, but I somehow did not expect such a large discrepancy.

Raku has a very good support for parallel execution and concurrent programming. I'm pretty sure it should be possible to make good use of this capability, but I haven't really looked at that topic for at least four years, so I don't think I could come up with a good parallel solution without spending quite a bit of effort and time. Also, with my poor computer with only four cores, I would certainly not be able to get results anywhere close to Mario Roy with his 32-core platform.

Wrapping-up

Perl Weekly Challenge 56 is up for your perusal!

Perl Weekly Challenge 55: Binary Numbers and Wave Arrays

These are some answers to the Week 55 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Task # 1: Flipping Binary Numbers

You are given a binary number B, consisting of N binary digits 0 or 1: s0, s1, …, s(N-1).

Choose two indices L and R such that 0 ≤ L ≤ R < N and flip the digits s(L), s(L+1), …, s(R). By flipping, we mean change 0 to 1 and vice-versa.

For example, given the binary number 010, the possible flip pair results are listed below:

L=0, R=0 the result binary: 110
L=0, R=1 the result binary: 100
L=0, R=2 the result binary: 101
L=1, R=1 the result binary: 000
L=1, R=2 the result binary: 001
L=2, R=2 the result binary: 011

Write a script to find the indices (L,R) that results in a binary number with maximum number of 1s. If you find more than one maximal pair L,R then print all of them.

Continuing our example, note that we had three pairs (L=0, R=0), (L=0, R=2), and (L=2, R=2) that resulted in a binary number with two 1s, which was the maximum. So we would print all three pairs.

There may be an analytical solution. For example, we may look for the longest sequence of 0s. But that’s not guaranteed to produce the maximum number of 1s. For example, the longest sequence of 0 may be 00000. But if we have somewhere else the sequence, 000010000, then is would be better to flip that sequence. It seems quite difficult to automatize the analysis. Especially, it seems difficult to make sure that we find all maximum index pairs. So we’ll use brute force: try all possibilities and pick up the best one(s).

Flipping Binary Numbers in Perl

The brute force algorithm is quite straight forward. We use nested loops to iterate over every possible $left-$right pair and store the index pair and the resulting string into an array (with the index being the number of 1s). Then, we just pick up the items with the highest array subscript:

use strict;
use warnings;
use feature "say";

sub flip {
    my $bin_nr = shift;
    die "Please supply a binary string." 
        unless $bin_nr =~ /^[01]*$/;
    my @chars = split //, $bin_nr;
    my @result;
    for my $left (0..$#chars) {
        for my $right ($left..$#chars) {
            my @tmp_chars = @chars;
            for my $i ($left..$right) {
                $tmp_chars[$i] = $chars[$i] ? 0 : 1;
            }
            my $count = scalar grep $_ == 1, @tmp_chars;
            $result[$count] .= "$left-$right: @tmp_chars\n";
        }
    }
    return $result[-1];
}
say flip shift // "01011" ;

Running the program a couple of times produces the following output:

$ perl binstr.pl 01001110000011
7-11: 0 1 0 0 1 1 1 1 1 1 1 1 1 1

$ perl binstr.pl 010011100010011
7-12: 0 1 0 0 1 1 1 1 1 1 0 1 1 1 1

Flipping Binary Numbers in Raku

We just use the same brute-force algorithm in Raku:

use v6;

sub flip ($bin-nr) {
    my @chars = $bin-nr.comb;
    my @result;
    for 0..@chars.end -> $left {
        for $left..@chars.end -> $right {
            my @tmp-chars = @chars;
            for $left..$right -> $i {
                @tmp-chars[$i] = @chars[$i] == 1  ?? 0 !! 1;
            }
            my $count = [+] @tmp-chars;
            @result[$count] ~= "$left-$right: @tmp-chars[]\n";
        }
    }
    return @result[*-1];
}
sub MAIN (Str $in where $in ~~ /^ <[01]>+ $/ = "01011") {
    say flip $in;
}

Running this program with the same input binary strings displays the same output as before:

$ perl6 binstr.p6 01001110000011
7-11: 0 1 0 0 1 1 1 1 1 1 1 1 1 1

$ perl6 binstr.p6 010011100010011
7-12: 0 1 0 0 1 1 1 1 1 1 0 1 1 1 1

Task 2: Wave Arrays

Any array N of non-unique, unsorted integers can be arranged into a wave-like array such that n1 ≥ n2 ≤ n3 ≥ n4 ≤ n5 and so on.

For example, given the array [1, 2, 3, 4], possible wave arrays include [2, 1, 4, 3] or [4, 1, 3, 2], since 2 ≥ 1 ≤ 4 ≥ 3 and 4 ≥ 1 ≤ 3 ≥ 2. This is not a complete list.

Write a script to print all possible wave arrays for an integer array N of arbitrary length.

Notes:

When considering N of any length, note that the first element is always greater than or equal to the second, and then the ≤, ≥, ≤, … sequence alternates until the end of the array.

Since we want to find all possible wave arrays, we’ll need to explore all possibilities, and we need again brute force. The pure brute force algorithm would be to generate all permutations and retain those matching the wave criteria. We can use an improved brute-force solution that builds only the permutations whose beginning matches the wave criteria, thereby reducing significantly the number of possibilities to explore.

Wave Arrays in Perl

To build the permutations, we use the add_1_item recursive subroutine that is called with three arguments: a mode, the input values and the output values. The mode is a flip-flop Boolean variable that tells us if the next item should be greater than the previous one (or equal), or if it should be less. Each time we add an item, we flip $mode from 1 to 0 or vice-versa. The add_1_item subroutine picks each of the input values, adds it to the output if the wave criteria is met, and it calls itself recursively. Note that our first implementation simply printed each result when found. But that did not work properly when there was some duplicate values in the input, as it would print several times the same wave sequences (which is probably undesired). Therefore, we’ve put the outcome in the %results hash to remove duplicate wave sequences before printing them.

use strict;
use warnings;
use feature "say";

my %results;

sub add_1_item {
    my ($mode, $input, $output) = @_;
    unless (@$input) {
        $results{"@$output"} = 1;
        return;
    }
    my $last = $output->[-1];
    for my $i (0..$#$input) {
        if ($mode == 0) {
            next if $input->[$i] > $last;
            add_1_item(1, [@$input[0..$i-1, $i+1..$#$input]], 
                [@$output, $input->[$i]]);

        } else {
            next if $input->[$i] < $last;
            add_1_item(0, [@$input[0..$i-1, $i+1..$#$input]], 
                [@$output, $input->[$i]]);
        }
    }
}
my @in = (1, 2, 3, 4);
@in = @ARGV if defined $ARGV[0];
for my $i (0..$#in) {
    add_1_item(0, [@in[0..$i-1, $i+1..$#in]], [$in[$i]]);
}
say for sort keys %results;

We display here two sample runs:

$ perl wave.pl 1 2 3 4
2 1 4 3
3 1 4 2
3 2 4 1
4 1 3 2
4 2 3 1

$ perl wave.pl  3 4 5 2 1
2 1 4 3 5
2 1 5 3 4
3 1 4 2 5
3 1 5 2 4
3 2 4 1 5
3 2 5 1 4
4 1 3 2 5
4 1 5 2 3
4 2 3 1 5
4 2 5 1 3
4 3 5 1 2
5 1 3 2 4
5 1 4 2 3
5 2 3 1 4
5 2 4 1 3
5 3 4 1 2

Wave Arrays in Raku

This is a port to Raku of the previous Perl program:

use v6;

my SetHash $results;

sub add_1_item ($mode, @input, @output) {
    unless @input.elems {
        $results{"@output[]"}++;
        return;
    }

    my $last = @output[*-1];
    for 0..@input.end -> $i {
        if ($mode == 0) {
            next if @input[$i] > $last;
            add_1_item(1, @input[0..$i-1, $i+1..@input.end].flat, 
                (@output, @input[$i]).flat);

        } else {
            next if @input[$i] < $last;
            add_1_item(0, @input[0..$i-1, $i+1..@input.end].flat, 
                (@output, @input[$i]).flat);
        }
    }
}

my @in = 1, 2, 3, 4;
@in = @*ARGS if @*ARGS.elems > 0;
for 0..@in.end -> $i {
    my @out = @in[$i],;
    add_1_item(0, @in[0..$i-1, $i+1..@in.end].flat, @out);
}
.say for $results.keys.sort;

This is the output for two sample runs:

$ perl6 wave.p6  3 4 2 1
2 1 4 3
3 1 4 2
3 2 4 1
4 1 3 2
4 2 3 1

$ perl6 wave.p6   3 4 5 2 1
2 1 4 3 5
2 1 5 3 4
3 1 4 2 5
3 1 5 2 4
3 2 4 1 5
3 2 5 1 4
4 1 3 2 5
4 1 5 2 3
4 2 3 1 5
4 2 5 1 3
4 3 5 1 2
5 1 3 2 4
5 1 4 2 3
5 2 3 1 4
5 2 4 1 3
5 3 4 1 2

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, April 19, 2020. And, please, also spread the word about the Perl Weekly Challenge if you can.

Perl Weekly Challenge 54: k-th Permutation Sequence and the Collatz Conjecture

These are some answers to the Week 54 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in a couple of days (April 5, 2020). This blog post offers some solutions to this challenge, please don’t read on if you intend to complete the challenge on your own.

Task 1: kth Permutation Sequence

Write a script to accept two integers n (>=1) and k (>=1). It should print the kth permutation of n integers. For more information, please follow the wiki page.

For example, n=3 and k=4, the possible permutation sequences are listed below:

123
132
213
231
312
321

The script should print the 4th permutation sequence 231.

It took me some questioning to figure out the requirement. My understanding is that the program should first generate a list of integers between 1 and n and then look for permutations in ascending order, and finally display the kth permutation.

kth Permutation in Perl

We write a recursive permute subroutine that generates permutations in the desired ascending order. Then we can just stop recursion once it has been called k times (thus avoiding to calculate all permutations when no needed).

use strict;
use warnings;
use feature "say";

my ($n, $k) = @ARGV;
my $err_msg = "Please supply two integer parameters freater than 0\n";
die $err_msg unless $n and $k;
die $err_msg if $n !~ /^\d{1,2}$/ or $k !~ /^\d+$/;
my @start = 1..$n;
permute("", @start);

sub permute {
    my ($str, @vals) = @_;
    if (scalar @vals == 0) {
        say $str and exit unless --$k; 
        return;
    }
    permute("$str " . $vals[$_], @vals[0..$_-1], @vals[$_+1..$#vals]) for 0..$#vals;
}

With the parameters n=3 and k=4, the program displays the following output:

$ perl permute.pl 3 4
 2 3 1

Note that I have decided to insert a space between the individual digits, as it makes it easier to visualize the individual values of the output when n is greater than 9 (and thus has more than one digit). For example, for the 350,000th permutation of the 1-35 range:

$ time perl permute.pl 35 350000
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 35 32 30 27 33 29 28 34 31

real    0m0,928s
user    0m0,890s
sys     0m0,030s

This is fairly fast: we’ve just computed the first 350,000 permutations of 35 items in less than one second. In case you need more speed with larger input values, you might try modules like Algorithm::Permute or ntheory. They are likely to be significantly faster. But I did not feel it was needed here.

kth Permutation in Raku

Raku has a built-in method, permutations that returns all possible permutations of a list as a Seq of lists. In addition, if the input list is in ascending order, the output permutation will also be in ascending order.

For example, with an input list of 1, 2, 3, the fourth permutation is:

perl6 -e 'say (1..3).permutations[3];'
(2 3 1)

In addition, although the documentation doesn’t state it explicitly, it appears that the permutations method acts lazily, i.e. it only generates the permutations needed for computing the desired final result. For example, the following one-liner computes the result (the 4th permutation) almost immediately (in less than one hundredth of a second):

$ perl6 -e 'say (1..20).permutations[3]; say now - INIT now;'
(1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 19 20 18)
0.0089752

which would obviously not be the case if it had to compute every permutation of a 20-item list before finding the fourth one (for a 20-item list, the number of permutations is 20!, i.e. 2432902008176640000, or about 2.4 billions of billions).

So we can write a one-liner script that accepts two integers i and k and prints the kth permutation of n integers as per the requirement:

$ perl6 -e 'say (1..@*ARGS[0]).permutations[@*ARGS[1]-1];' 3 4
(2 3 1)

If you prefer a stand-alone script, we can write this:

use v6;

sub MAIN (Int $n where * > 0, Int $k where * > 0) {
    (1..$n).permutations[$k - 1].say;
}

This outputs the same result as before:

$ ./perl6 permute.p6 3 4
(2 3 1)

Task 2: the Collatz Conjecture

It is thought that the following sequence will always reach 1:

$n = $n / 2 when $n is even
$n = 3*$n + 1 when $n is odd

For example, if we start at 23, we get the following sequence:

23 → 70 → 35 → 106 → 53 → 160 → 80 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1

Write a function that finds the Collatz sequence for any positive integer. Notice how the sequence itself may go far above the original starting number.

Extra Credit: have your script calculate the sequence length for all starting numbers up to 1000000 (1e6), and output the starting number and sequence length for the longest 20 sequences.

The Collatz conjecture concerns a sequence defined as follows: start with any positive integer n. Then each term is obtained from the previous term as follows: if the previous term is even, the next term is one half of the previous term. If the previous term is odd, the next term is 3 times the previous term plus 1. The conjecture is that, no matter what value of n, the sequence will always reach 1. This conjecture is named after Lothar Collatz who introduced it in 1937. It is sometimes known as the Syracuse problem (and some other names). It is usually believed to be true (and no counter-example has been found), but, despite a lot of efforts, nobody has been able to prove it, and this is deemed to be a very difficult problem.

The Collatz Conjecture in Perl

The Basic Task

For the purpose of the basic task, this is fairly straight forward. Here, we write a next_collatz subroutine that, given an integer computes the next number in the Collatz sequence. And we call that subroutine in a loop until we reach 1:

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

sub next_collatz { 
    my $num = shift;
    $num % 2 ? 3 * $num + 1 : $num / 2;
}

my $n = shift;
my @result = ($n);
while (1) {
    $n = next_collatz $n;
    push @result, $n;
    last if $n == 1;
}
say "@result";

These are some example outputs:

$ perl collatz.pl 23
23 70 35 106 53 160 80 40 20 10 5 16 8 4 2 1

$ perl collatz.pl 24
24 12 6 3 10 5 16 8 4 2 1

$ perl collatz.pl 25
25 76 38 19 58 29 88 44 22 11 34 17 52 26 13 40 20 10 5 16 8 4 2 1

$ perl collatz.pl 26
26 13 40 20 10 5 16 8 4 2 1

$ perl collatz.pl 27
27 82 41 124 62 31 94 47 142 71 214 107 322 161 484 242 
121 364 182 91 274 137 412 206 103 310 155 466 233 700 
350 175 526 263 790 395 1186 593 1780 890 445 1336 668 
334 167 502 251 754 377 1132 566 283 850 425 1276 638 
319 958 479 1438 719 2158 1079 3238 1619 4858 2429 
7288 3644 1822 911 2734 1367 4102 2051 6154 3077 9232 
4616 2308 1154 577 1732 866 433 1300 650 325 976 488 
244 122 61 184 92 46 23 70 35 106 53 160 80 40 20 
10 5 16 8 4 2 1

(The latest example just above was slightly reformatted for the purpose of this blog post.)

Extra Credit: Collatz Sequence for all Numbers up to 1,000,000

In theory, it wouldn’t be very complicated to encapsulate the above program into an outer loop to compute the Collatz sequence for any range of numbers. Except that going all the way up to 1,000,000 is probably going to take ages. One of the reason is that we’re going to recompute Collatz sequence successors for the same number again and again many times. If you look at the above examples, the sequences all end up with the following sequence: 16 8 4 2 1. So, it might be useful, when we reach 16 for the first time, to compute the end of the sequence only once, and to store it in a hash of arrays (or possibly an array of arrays), in order to retrieve it straight from the hash when we reach 16 once more. Similarly, the sequence for 25 end with 40 20 10 5 16 8 4 2 1. If we store this sequence somewhere, then we don’t have to compute it once more when we reach 40 while computing the Collatz sequence for 27, and, or course, also when we compute the Collatz sequence for 40, 80, 160, as well as 13, 26, 52, etc. Such a strategy is called caching or memoizing: storing in memory the result of a computation that we’re likely to have to compute again. It is sometimes described as “trading memory for time.”

There is a core module, called Memoize, written my Mark Jason Dominus, that is very easy to use can do the caching automatically for you. The problem though is that it wouldn’t be very practical to use it here, because we don’t want to cache just the next item in the sequence, but all the rest of the sequence down to 1. So it might be better to implement a cache ourselves, manually (that’s not very difficult, as we shall see).

There is another problem though, which is much more delicate. Since the requirement is to compute the Collatz sequence for all integers up to 1,000,000, the cache will grow very large (several millions of sequences) and we might run out of memory. In the first version of the program below, I tried to store all sequences up to one million, and the program turned out to be painfully slow. Looking at the system statistics, I found that, after a while, available memory became exhausted and the system would swap memory on the disk, leading to very slow execution. I made a couple of tests, and found that I could store the sequences for all numbers up to about 300,000 without exceeding the available memory of my computer (that number might be different on your computer), thus preventing the process from swapping and getting more or less the best possible performance, hence the MAX constant set to 300,000. Since I knew from earlier tests that the 20 longest sequences would all have more than 400 items, I also hard-coded a lower limit of 400 items for the sequences whose length had to be recorded. Another possibly better solution might have been to maintain a sliding array of the top 20 sequences, but I feared that maintaining this array many times over the execution of the program would end up impairing performance.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;
use Data::Dumper;
use constant MAX => 300000;

my %cache;

sub collatz_seq {
    my $input = shift;
    my $n = $input;
    my @result;
    while ($n != 1) {
        if (exists $cache{$n}) {
            push @result, @{$cache{$n}};
            last;
        } else {
            my $new_n = $n % 2 ? 3 * $n + 1 : $n / 2;
            push @result, $new_n;
            $cache{$n} = [$new_n, @{$cache{$new_n}}] 
                if defined ($cache{$new_n}) and $n < MAX;
            $n = $new_n;
        }
    }
    $cache{$input} = [@result] if $n < MAX;
    return @result;
}

my @long_seqs;
for my $num (1..1000000) {
    my @seq = ($num, collatz_seq $num);
    push @long_seqs, [ $num, scalar @seq] if scalar @seq > 400;
}

@long_seqs = sort { $b->[1] <=> $a->[1]} @long_seqs;
say  "$_->[0]: $_->[1]" for @long_seqs[0..19];
# say "@{$cache{$long_seqs[0][0]}}";

With these optimizations, I was able to reduce execution time to 1 min 7 sec.:

$ time perl collatz.pl
837799: 525
626331: 509
939497: 507
704623: 504
910107: 476
927003: 476
511935: 470
767903: 468
796095: 468
970599: 458
546681: 452
818943: 450
820022: 450
820023: 450
410011: 449
615017: 447
886953: 445
906175: 445
922524: 445
922525: 445

real    1m7,469s
user    1m6,015s
sys     0m1,390s

Uncomment the last statement if you want to see the longest sequence (with 525 items).

Update: A couple of days after I posted this, I figured out a much better caching strategy removing the difficulties explained above and giving much better performance. It is explained in this blog post.

The Collatz Conjecture in Raku

The Basic Task

For the purpose of the basic task, this is fairly straight forward. Just as for the Perl solution, we write a collatz-seq subroutine that, given an integer computes the next number in the Collatz sequence. And we call that subroutine in a loop until we reach 1:

use v6;

sub collatz-seq (UInt $in) {
    my $n = $in;
    my @result = gather {
        while $n != 1 {
            my $new-n = $n % 2 ?? 3 * $n + 1 !! $n / 2;
            take $new-n;
            $n = $new-n;
        }
    }
    return $in, |@result;
}
sub MAIN (UInt $in) {
    my @seq = collatz-seq $in;
    print "Collatz sequence for $in: ", @seq, "\n";
}

Here are a few sample runs:

$ perl6  collatz_1.p6 8
Collatz sequence for 8: 8 4 2 1

$ perl6  collatz_1.p6 23
Collatz sequence for 23: 23 70 35 106 53 160 80 40 20 10 5 16 8 4 2 1

$ perl6  collatz_1.p6 25
Collatz sequence for 25: 25 76 38 19 58 29 88 44 22 11 34 17 52 26 13 40 20 10 5 16 8 4 2 1

Note that I used the print function rather than the say function here, because say would abbreviate long sequences (for example, the sequence for 27 would end with an ellipsis: ...).

Extra Credit: Collatz Sequence for all Numbers up to 1,000,000

Please refer to the Extra Credit subsection in the Perl section above for a detailed discussion of the caching strategy. The program below is essentially a port to Raku of the Perl program above:

use v6;

my %cache = 2 => [2, 1];

sub collatz_seq (UInt $in) {
    my @result;
    my $n = $in;
    while $n != 1 {
        if %cache{$n} :exists {
            push @result, |@(%cache{$n});
            last;
        } else {
            my $new_n = $n % 2 ?? 3 * $n + 1 !! $n / 2;
            push @result, $new_n;
            %cache{$n} = [$new_n, |%cache{$new_n}] 
                if defined (%cache{$new_n}) and $new_n <= 200000;
            $n = $new_n.Int;
        }
    }
    %cache{$in} = @result if $in <= 200000;
    return @result;
}

my @long_seqs;
for 1..1000000 -> $num {
    my $seq = collatz_seq $num;
    push @long_seqs, [ $num, $seq.elems] if $seq.elems > 400;
}
@long_seqs = sort { $^b[1] <=> $^a[1]}, @long_seqs;
say  "$_[0]: $_[1]" for @long_seqs[0..19];

This program displays more or less the same output as the previous Perl program:

$ perl6 collatz.p6
837799: 525
626331: 509
939497: 507
704623: 504
910107: 476
927003: 476
511935: 470
767903: 468
796095: 468
970599: 458
546681: 452
818943: 450
820022: 450
820023: 450
410011: 449
615017: 447
886953: 445
906175: 445
922524: 445
922525: 445

This program ran in more than 9 minutes, so Raku is still significantly slower than Perl (at least for such CPU intensive computations).

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, April 12, 2020. And, please, also spread the word about the Perl Weekly Challenge if you can.

About laurent_r

user-pic I am the author of the "Think Perl 6" book (O'Reilly, 2017) and I blog about the Perl 5 and Raku programming languages.