July 2019 Archives

Perl Weekly Challenge # 19: Weekends and Wrapping Lines

These are some answers to the Week 19 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in several days from now (August 4, 2019). This blog post offers some solutions to this challenge, please don't read on if you intend to complete the challenge on your own.

Challenge # 1: Five Weekends in a Month

Write a script to display months from the year 1900 to 2019 where you find 5 weekends i.e. 5 Friday, 5 Saturday and 5 Sunday.

This time, I'll start with Perl 6, because the built-in Date type seems to make it easier.

Five Weekends in Perl 6

My first idea was to loop over each month in the range 1900-2019. For each month, find the first Friday and then count the number of Sundays after that in the month. Then, even before I started to code that, it came to my mind that I didn't need to loop over all the days of the month, but just to count how many days there were in the month after the first Friday: there will five weekends if there are more than 29 days (4 weeks plus 1 day) after that first Friday of the month.

use v6;

for 1900..2019 -> $year {
    for 1..12 -> $month {
        my $day = 1;
        my $date = Date.new($year, $month, $day);
        my $last-day-of-month = 
            $date.later(month => 1).earlier(day => 1);
        ++$date until $date.day-of-week == 5;
        say $year, "-", $month.fmt("%02d"), " has 5 weekends" 
            if $last-day-of-month - $date > 29;
    }
}

That works fine:

1901-03 has 5 weekends
1902-08 has 5 weekends
1903-05 has 5 weekends
1904-01 has 5 weekends
...
(lines omitted for brevity)
...
2016-01 has 5 weekends
2016-07 has 5 weekends
2017-12 has 5 weekends
2019-03 has 5 weekends

Then I started to check the result and looked at a calendar, and it became obvious to me that it is actually even much simpler than that: to have 5 full weekends (Friday through Sunday), a month needs to have 31 days (so January, March, May, etc.) and to start with a Friday. So this is my new simpler script:

use v6;

for 1900..2019 -> $year {
    for 1, 3, 5, 7, 8, 10, 12 -> $month {
        say "$year-{$month.fmt("%02d")} has 5 weekends." 
            if Date.new($year, $month, 1).day-of-week == 5;
    }
}

This prints the same as before, there no point repeating the output.

Five Weekends in Perl 5

I originally thought that solving the challenge in Perl 5 would require either complex computations or the use of sophisticated modules such as Date::Calc or Time::Piece. Given the simplification found in the course of solving the problem in Perl 6, we only need the Time::Local core module, which provides functions that are the inverse of built-in localtime() and gmtime() functions.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;
use Time::Local;

for my $year (0..119) {
    for my $month ( 1, 3, 5, 7, 8, 10, 12) {
        my $date = timegm (0, 0, 0, 1, $month - 1, $year);
        say $year + 1900, "-", sprintf("%02d ", $month)
            if (gmtime $date)[6] == 5;
    }
}

This program produces the same output as the P6 solution above.

Wrapping Lines

Write a script that can wrap the given paragraph at a specified column using the greedy algorithm.

For this, we will suppose our wrap subroutine receives a line of text as a parameter and return a string wrapped to fit a certain maximal width. We will also suppose that the input text is plain ASCII, some changes may have to be done for Unicode text.

Wrapping Lines in Perl 6

The most typical way to solve such a problem is usually to split the input into words and to add words to a line until it goes over the maximal width; at this point, the script removes the last word, produces the line and starts the new iteration with the removed word.

I tend to think it will be more efficient to look for a space backward (with the rindex built-in function) from the maximal length position, because this will be doing less string manipulations:

use v6;

sub wrap (Str $line is copy, Int $width) {
    my $out = '';
    while ($line) {
        return $out ~ "$line\n" if $line.chars < $width;
        my $pos = rindex $line, ' ', $width - 1;
        $out = $out ~ substr($line, 0, $pos) ~ "\n";
        $line = substr $line, $pos+1;
    }
    return $out;
}

my $in = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Dolor sed viverra ipsum nunc aliquet bibendum enim. In massa tempor nec feugiat. Nunc aliquet bibendum enim facilisis gravida. Nisl nunc mi ipsum faucibus vitae aliquet nec ullamcorper. Amet luctus venenatis lectus magna fringilla.";

say wrap $in, $_ for 60, 35;

This produces the following output:

$ perl6 wrap.p6
Lorem ipsum dolor sit amet, consectetur adipiscing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna
aliqua. Dolor sed viverra ipsum nunc aliquet bibendum enim.
In massa tempor nec feugiat. Nunc aliquet bibendum enim
facilisis gravida. Nisl nunc mi ipsum faucibus vitae
aliquet nec ullamcorper. Amet luctus venenatis lectus magna
fringilla.

Lorem ipsum dolor sit amet,
consectetur adipiscing elit, sed
do eiusmod tempor incididunt ut
labore et dolore magna aliqua.
Dolor sed viverra ipsum nunc
aliquet bibendum enim. In massa
tempor nec feugiat. Nunc aliquet
bibendum enim facilisis gravida.
Nisl nunc mi ipsum faucibus vitae
aliquet nec ullamcorper. Amet
luctus venenatis lectus magna
fringilla.

Wrapping Lines in Perl 5

Since Perl 5 also has the rindex built-in function, we can just do the same in Perl 5:

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

sub wrap {
    my ($line, $width) = @_;
    my $out = '';
    while ($line) {
        return $out . "$line\n" if length $line < $width;
        my $pos = rindex $line, ' ', $width - 1;
        $out = $out . substr($line, 0, $pos) . "\n";
        $line = substr $line, $pos+1;
    }
    return $out;
}

my $in = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Dolor sed viverra ipsum nunc aliquet bibendum enim. In massa tempor nec feugiat. Nunc aliquet bibendum enim facilisis gravida. Nisl nunc mi ipsum faucibus vitae aliquet nec ullamcorper. Amet luctus venenatis lectus magna fringilla.";
say wrap $in, $_ for 60, 35;

This produces the same output as the Perl 6 implementation, so I won't repeat it here.

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, August 11. And, please, also spread the word about the Perl Weekly Challenge if you can.

And see you hopefully at the European Perl Conference in Riga, Aug. 7 to 9, 2019.

Perl Weekly Challenge # 18: Priority Queues and Binary Heaps In Perl 6

In this previous blog post, I provided some answers to the Week 18 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

However, I omitted the Perl 6 solution to the priority queues part of the challenge, because I wanted to use a binary heap to solve it, and this required too many explanations: the blog post was just getting too long. So this post will complete the previous post and look into binary heaps in Perl 6.

Spoiler Alert: This weekly challenge deadline is due in several days from now (July 28, 2019). This blog post offers some solutions to this challenge, please don't read on if you intend to complete the challenge on your own.

This part of the challenge goes like this:

Write a script to implement Priority Queue. It is like regular queue except each element has a priority associated with it. In a priority queue, an element with high priority is served before an element with low priority. Please check this wiki page for more information. It should serve the following operations:

  • 1) is_empty: check whether the queue has no elements.

  • 2) insertwithpriority: add an element to the queue with an associated priority.

  • 3) pullhighestpriority_element: remove the element from the queue that has the highest priority, and return it. If two elements have the same priority, then return element added first.

As already explained in my previous post, there are numerous ways to design simple priority queues (at least when performance is not an issue, for instance if the data set isn't very large). For example, it might be sufficient to maintain an array of arrays (AoA), where each of the arrays is a pair containing the item and associated priority. Or an array of hashes (AoH) based on the same idea. This means that each time we want to pull the highest priority element, we need to traverse the whole data structure to find the item with the highest priority. This may be quite slow when there are many items, but this may not matter if our data structure only has a few dozen items.

Another way is to build a hash of arrays (HoA), where the hash keys are the priorities and the hash values are references to arrays. When the number of priorities is relatively small (compared to the number of items in the queues), this tends to be more efficient, but note that we still need to traverse the keys of the hash until we find the highest priority. An AoA with the index being the priority and the sub-hashes the item might be more efficient (because the priorities remain sorted), but this requires the priorities to be relatively small positive integers. We still have to traverse the top data structure until we find the first non-empty sub-array. We used this data structure for our Perl 5 implementation and can translate our P5 code into Perl 6 as follows:

use v6;

sub new-queue {
    my @queue;  # an AoA
    sub is_empty {
        for @queue -> $item {
            next unless defined $item;
            return False if $item.elems > 0;
        }
        True;
    }
    sub insert_with_prio ($item, Int $prio) {
        push @queue[$prio], $item;
    }
    sub pull_highest_prio {
        for reverse @queue -> $item {
            next unless defined $item;
            return shift $item if $item.elems > 0;
        }
    }
    return &is_empty, &insert_with_prio, &pull_highest_prio;
}

my (&is-empty, &insert, &pull-prio) = new-queue;
# Testing the above code
for 1..20 -> $num {
    insert($num,
        $num %% 10 ?? 10 !!
        $num %% 5  ?? 5  !!
        $num %% 3  ?? 3  !!
        $num %% 2  ?? 2  !! 
        1);
}        
for 1..20 -> $num {
    say pull-prio;
}
say "Empty queue" if is-empty();

As in the P5 version, we're using functional programming to implement a pseudo-object system, in which the new-queue subroutine is acting as an object constructor, although it is implemented as a function factory. The @queue object is limited to the scope of the new-queue constructor, but the three methods returned to the caller are closures that can access to the content of their shared object, @queue. Please refer to the other blog post mentioned above for further details on this implementation and on the test code at the end. This script displays more or less the same as the P5 implementation:

$ perl6  queues.p6
10
20
5
15
3
6
9
12
18
2
4
8
14
16
1
7
11
13
17
19
Empty queue

Another possibility is to use a heap, a data structure that usually has better performance (when it matters). This what we will look into now.

Background on Binary Heaps

A binary heap is a binary tree that keeps a partial order: each node has a value less than its parent and larger than either of its two children; there is no specific order imposed between siblings. (You may also do it the other way around: you can design heaps in which any node has a value larger than its parent, you basically only need to reverse the comparison.)

Because there is no order between siblings, it is not possible to find a particular element without potentially searching the whole heap. Therefore, a heap is not very good if you need random access to specific nodes. But if you're interested in always finding the largest (or smallest) item, then a heap is a very efficient data structure.

Heaps are used for solving a number of computer science problems, and also serve as the basis for an efficient and very popular sorting technique called heap sort.

For a human, it is useful to represent a heap in a tree-like form. But a computer can store a heap as a simple array (not even a nested array). For this, the index of an element is used to compute the index of its parent or its two children. Roughly speaking, the children of an element are the two locations where the indices are about double its index; conversely, the parent of a node is located at about half its index. If the heap starts at index 0, the exact formulas for a node with index $n are commonly as follows:

  • Parent: int( ($n-1)/2 )
  • Left child: 2*$n + 1
  • Right child: 2*$n + 2

The root node is at index 0. Its children are at positions 1 and 2. The children of 1 are 3 and 4 and the children of 2 are 5 and 6. The children of 3 are 7 and 8, and so on.

Suppose we build a heap (in ascending order) from an array of all letters between a and v provided in any pseudo-random order, for example:

my @input =  <m t f l s j p o b h v k n q g r i a d u e c>;

The resulting @heap might be something like this:

[a b g d c k j l f h e m n q p t r o i u s v]

We will see below on how to build such a heap from an unordered array, but let's concentrate for now on the heap properties.

The order in the @heap above may not be immediately obvious, but a is the smallest letter, and its two children, b and g, are larger than a. The children of b are d and c and are larger than their parent b. Similarly, the children of g are k and j and are larger than their parent. And so on. But it is rather inconvenient to manually check that we have a valid heap. So, we may want to write a helper subroutine to display the heap in a slightly more graphical way:

sub print-heap (@heap) {
    my $start = 0;
    my $end = 0;
    my $last = @heap.end;
    my $step = 1;
    loop {
        say @heap[$start..$end];
        last if $end == $last;
        $start += $step;
        $step *= 2;
        $end += $step;
        $end = $last if $end > $last;
    } 
}

This subroutine will not be used in the final code, but it proved to be very useful for debugging purposes.

If we pass the letter heap as an argument to this subroutine, it will be displayed in the following format:

(a)
(b g)
(d c k j)
(l f h e m n q p)
(t r o i u s v)

With a little bit of reformatting we can now see its structure in a tree-like format:

                (a)
        (b              g)
    (d       c      k        j)
  (l  f    h  e   m  n     q   p)
(t r o i  u s v ...)

And from that, we can now easily draw the tree:

heap.png

The important thing to notice is that there is no particular order between siblings, but children are always larger than their parent.

How to build a Binary Heaps

Since we'll be dealing later with integers (priorities) in descending order, we will abandon our ascending order letter heap. Let's suppose we have this heap example taken from the implementation section of the Wikipedia page on heaps):

my @heap = 100, 19, 36, 17, 12, 25, 5, 9, 15, 6, 11;

For the time being, we will consider it is a global variable accessible anywhere in the file.

Our print-heap helper subroutine would display it as:

(100)
(19 36)
(17 12 25 5)
(9 15 6 11)

We can see it's a valid heap (the children are always smaller than their parent).

Let's now add a new item, say 45, at the end of this array (for example with the push function). Of course, this item is not at its right place and the array is no longer a valid heap, but we can now use the following subroutine to move items around in order to obtain again a valid heap:

sub add-in-heap ($index is rw) {
    my $index-val = @heap[$index];
    while ($index) {
        my $parent-idx = Int( ($index - 1) /2);
        my $parent-val = @heap[$parent-idx];
        last if $parent-val >    $index-val;
        @heap[$index] = $parent-val;
        $index = $parent-idx;
    }
    @heap[$index] = $index-val;
}

The parameter passed is the index of the item that has just been added at the end of the array (11 in this example). This subroutine looks at the value of the parent of this new item. If the parent is larger than the new item, then we're done: the new array happens to be a valid heap (which is not the case in our example). If not, then we move the parent value to the position where we've just added the new element. Then we change the index of interest to the parent and iterate this way until either the elements are in the right place (the parent value is larger than the current index value) or the index become 0 (we've reached the root node). At this point, the loop ends and we can put the value we've added in the right place. If you think about it in terms of the binary tree shown above, we're really exploring the single path from the added element to the root (although we may not have to go all the way up to the root), the rest of the heap remains untouched.

Note that this subroutine is not designed to do anything special when fed with duplicate values. Here, duplicates will he handled gracefully and returned in the correct order. So, that's OK, it works fine, but we'll have to do something special about it when we will implement priority queues (if we had two priorities with the same value in the heap, we would be unable to predict the order in which items having the same priority will be pulled).

This subroutine will move around items from parent to child from the end to the beginning of the array (or at least until the new added value finds its right place), so that we get a new valid heap:

[100 19 45 17 12 36 5 9 15 6 11 25]

Using the helper subroutine to display the new heap outputs this:

(100)
(19 45)
(17 12 36 5)
(9 15 6 11 25)

I'll leave it to you to draw the tree to check that it is a proper heap.

We now know how to add an item to a existing heap, we can of course use that subroutine to add an item to an empty heap, and we can use that subroutine repeatedly to place each item in its proper place in order to create a heap from an input list in any order:

for @array.keys -> $i {
    my $idx = $i;
    add-in-heap $idx;
}

At the end of this loop, the @array will have been turned into a heap.

Removing One Element from the Heap

If we're looking for the largest element, it will be the root of the tree, i.e. the first item of the array.

Now, if we want to use this data structure to manage a priority queue, we will need at some point to delete the value in the root node and to reorganize the array so that it becomes again a legitimate heap. When we remove (100) from the above array, we have to choose the largest item between the two children, i.e. 45 in our example, and promote it as a new root node. And we can then propagate similarly the needed changes until the end of the array.

But the thermometer outside my house now shows 43.6 °C (in the shade), and it is more than 37°C inside. So, I'll be a bit lazy for a moment and, rather than writing such a new subroutine (which should be done if you want to be efficient and will be done below), I'll consider the array with the root node removed as an array in no particular order and use the code already written (the add-in-heap subroutine) to build a new heap from it:

sub take-from-heap {
    my $result = shift @heap;
    for @heap.keys -> $i {
        my $idx = $i;
        add-in-heap $idx;
    }
    return $result;
}

If we run that subroutine on our existing heap, it will return the largest item (the root node, i.e. 100) to the caller and reorganize the rest of the array into a new heap:

[45 36 17 15 25 5 9 12 6 11 19]

(45)
(36 17)
(15 25 5 9)
(12 6 11 19)

OK, this works, but reconstructing the full heap each time we remove an item is somewhat inefficient, which goes against the very purpose of heaps. What should a proper take-from-heap subroutine do? Take a look again at the binary tree displayed above. If we take off the root node value (a), we should replace it by b which is larger than g. It should be clear that we won't need to change anything in the g sub-tree. And we can recursively replace b by c, and then c by e and finally e by v. Nothing else needs to be changed. So basically we have to move up one step each of the nodes on the path of the smallest nodes in the b sub-tree. And, by the way, it is thanks to the fact that, whether we add a new item or remove an item from the heap, we only need to traverse one single path through the heap that insertion and deletion operations have a 0(log n) complexity and are therefore fairly fast. Implementing the ideas just described is not too difficult, but, for each visited node, we need to take into account three possible cases: this node may have 0, 1 or 2 children.

sub take-from-heap {
    my $result = @heap[0];
    my $index = 0;
    loop {
        my $left-index = 2 * $index + 1;  
            # right-index is $left-index + 1
        unless (defined @heap[$left-index] or 
            defined @heap[$left-index + 1]) {
                @heap.splice($index, 1);
                last;
        }
        unless defined @heap[$left-index + 1] {
            @heap[$index] = @heap[$left-index]:delete;
            last;
        }
        unless defined @heap[$left-index] { # probably not happening
            @heap[$index] = @heap[$left-index + 1]:delete;
            last;
        }            
        # both children are defined if we get here
        my $next-index = ($left-index, 
            $left-index + 1).max({@heap[$_]});
        @heap[$index] = @heap[$next-index];
        $index = $next-index;
    }
    return $result;
}

If we run this new subroutine on our previous heap, we obtain this new heap:

[45 19 36 17 12 25 5 9 15 6 11]
(45)
(19 36)
(17 12 25 5)
(9 15 6 11)

Note that this is not the same heap as the one obtained before (same data but not in the same order), but this is another valid heap for such data. Using this subroutine repeatedly, we'll get the nodes in the same order: 45, 36, 25, 19, 17 etc. For example, let's run the new take-from-heap 10 times on our original heap and print out each time the removed first item and the resulting heap:

say "First item = ", take-from-heap, "; Heap: ", @heap for 1..10;

We can see that we have a valid heap at each iteration and pull the values in the right order:

First item = 100; Heap: [45 19 36 17 12 25 5 9 15 6 11]
First item = 45; Heap: [36 19 25 17 12 5 9 15 6 11]
First item = 36; Heap: [25 19 9 17 12 5 15 6 11]
First item = 25; Heap: [19 17 9 11 12 5 15 6]
First item = 19; Heap: [17 12 9 11 5 15 6]
First item = 17; Heap: [12 11 9 5 15 6]
First item = 12; Heap: [11 15 9 5 6]
First item = 11; Heap: [15 6 9 5]
First item = 15; Heap: [9 6 5]
First item = 9; Heap: [6 5]

So, it seems that we have a working algorithm to manage heaps. Let's turn now to priority queues.

A Priority Queue as a Heap

Basically, we want to manage our priorities with a heap, and each priority will be associated with an array containing the individual items in the order in which they were inserted. To give you immediately an idea of the data structure, the queue will look like this at a certain point during the execution of the tests in the script below:

[[10 [10 20]] [5 [5 15]] [2 [2 4 8 14 16]] [1 [1 7 11 13 17 19]] [3 [3 6 9 12 18]]]

The first item in the queue displayed above, [10 [10 20]], is the data structure for priority 10, which contains two elements, 10 and 20. The next one is for priority 5. And so on.

When we are inserting elements (item and priority), we first call insert_with_prio to check whether there is already an array for the given priority. If it already exists, we just add the item to the array of elements associated with this priority. If there no array with such priority, then we call add-to-queue to add a priority data structure into the heap (and reorganize the heap as we've done before). Similarly, when we call pull_highest_prio, we just pick up and return the first element from the data array of the first priority item. In the event that the data array of a given priority becomes empty, then we call take-from-heap to remove the priority data structure from the heap (and reorganize the heap as we've done before).

use v6;
sub new-queue {
    my @queue;  # an AoA
    sub is_empty {
        @queue.elems == 0;
    }
    sub insert_with_prio ($item, Int $prio) {
        my $index = first {@queue[$_][0] == $prio}, @queue.keys;
        if (defined $index) {
            push @queue[$index][1], $item;
        } else {
            push @queue, [$prio, [$item]];
            my $idx = @queue.end;
            add-to-queue($idx);
        }
    }        
    sub pull_highest_prio {
        return Nil if is-empty;
        my $result = shift @queue[0][1];
        take-from-heap if @queue[0][1].elems == 0;
        return $result;
    }
    sub add-to-queue ($index is rw) {
        my $index-val = @queue[$index];
        while ($index) {
            my $parent-idx = Int( ($index - 1) /2);
            my $parent-val = @queue[$parent-idx];
            last if $parent-val[0] > $index-val[0];
            @queue[$index] = $parent-val;
            $index = $parent-idx;
        }
        @queue[$index] = $index-val;
    }
    sub take-from-heap {
        my $index = 0;
        loop {
            my $left-index = 2 * $index + 1; 
                # right-index is $left-index + 1
            unless (defined @queue[$left-index] or 
                defined @queue[$left-index + 1]) {
                    @queue.splice($index, 1);
                    last;
            }
            unless defined @queue[$left-index + 1] {
                @queue[$index] = @queue[$left-index]:delete;
                last;
            }
            unless defined @queue[$left-index] {
                @queue[$index] = @queue[$left-index + 1]:delete;
                last;
            }            
            # both children are defined if we get here
            my $next-index = ($left-index, 
                $left-index + 1).max({@queue[$_][0]});
            @queue[$index] = @queue[$next-index];
            $index = $next-index;
        }
    }

    return &is_empty, &insert_with_prio, &pull_highest_prio;
}
my (&is-empty, &insert, &pull-prio) = new-queue;
# Testing the above code: 20 insertions and then trying 30 deletions
for 1..20 -> $num {
    insert($num,
        $num %% 10 ?? 10 !!
        $num %% 5  ?? 5  !!
        $num %% 3  ?? 3  !!
        $num %% 2  ?? 2  !! 
        1);
} 
for 1..30 -> $num {
    last if is-empty;
    say pull-prio;
}
say "Empty queue" if is-empty();

This program displays more or less the same as before:

$ perl6  heap_queue.p6
10
20
5
15
3
6
9
12
18
2
4
8
14
16
1
7
11
13
17
19
Empty queue

Adding some additional print statements shows how the priority queue is evolving when we pull elements from it:

[[10 [10 20]] [5 [5 15]] [2 [2 4 8 14 16]] [1 [1 7 11 13 17 19]] [3 [3 6 9 12 18]]]

[ ... lines omitted for brevity ...]

Pulled 18; New queue: [[2 [2 4 8 14 16]] [1 [1 7 11 13 17 19]]]
Pulled  2; New queue: [[2 [4 8 14 16]] [1 [1 7 11 13 17 19]]]
Pulled  4; New queue: [[2 [8 14 16]] [1 [1 7 11 13 17 19]]]
Pulled  8; New queue: [[2 [14 16]] [1 [1 7 11 13 17 19]]]
Pulled 14; New queue: [[2 [16]] [1 [1 7 11 13 17 19]]]
Pulled 16; New queue: [[1 [1 7 11 13 17 19]]]
Pulled  1; New queue: [[1 [7 11 13 17 19]]]
Pulled  7; New queue: [[1 [11 13 17 19]]]
Pulled 11; New queue: [[1 [13 17 19]]]
Pulled 13; New queue: [[1 [17 19]]]
Pulled 17; New queue: [[1 [19]]]
Pulled 19; New queue: []
Empty queue

The code is quite long and is certainly not worth the effort if we're going to manage only 20 data elements and 5 priorities, as in our test cases above. But with much larger datasets and wider ranges of priority, it should be more efficient than our other implementations. If we're going to use many priority queues, the code of the add-to-queue and take-from-heap subroutines could be stored separately in a module, making the new-queue code much smaller and more manageable.

Note that the insert_with_prio subroutine is traversing sequentially the heap to figure out wether the priority data struture already exists in the heap. Depending on the number of priorities, this might become time consuming. It would be easy to add and maintain a hash keeping track of the existing priorities and their position in the heap, to avoid sequential search. I did not do it because I considered this to be an implementation detail that may be or may not be useful depending on the exact circumstances. I would probably do it if I were to write a heap priority library for a CPAN module.

A Heap Priority Queue in Perl 5

Asides from the small syntax differences between the two languages, there is nothing in what we did above in Perl 6 that cannot be done almost identically in Perl 5. For the benefit of readers who might not know yet the Perl 6 syntax or may otherwise find it easier to test and play in Perl 5, we provide here a translation of the heap priority queue in Perl 5:

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;
use Data::Dumper;

sub new_queue {
    my @queue;  # an AoA
    my $add_to_queue = sub  {
        my $index = shift;
        my $index_val = $queue[$index];
        while ($index) {
            my $parent_idx = int( ($index - 1) /2);
            my $parent_val = $queue[$parent_idx];
            last if $parent_val->[0] > $index_val->[0];
            $queue[$index] = $parent_val;
            $index = $parent_idx;
        }
        @queue[$index] = $index_val;
    };
    my $take_from_heap = sub {
        my $index = 0;
        while (1) {
            my $left_index = 2 * $index + 1; 
                # right_index is $left_index + 1
            unless (defined $queue[$left_index] or 
                defined $queue[$left_index + 1]) {
                    splice @queue, $index, 1;
                    last;
            }
            unless (defined $queue[$left_index + 1]) {
                $queue[$index] = delete $queue[$left_index];
                last;
            }
            unless (defined $queue[$left_index]) {
                $queue[$index] = delete $queue[$left_index + 1];
                last;
            }            
            # both children are defined if we get here
            my $next_index = $queue[$left_index][0] 
                > $queue[$left_index + 1][0] ?
                    $left_index : $left_index + 1;
            $queue[$index] = $queue[$next_index];
            $index = $next_index;
        }
    };
    my $is_empty = sub  {
        scalar @queue == 0;
    };
    my $insert_with_prio = sub  {
        my ($item, $prio) = @_;
        my $index;
        for my $priority (0..$#queue) {
            if($queue[$priority][0] == $prio) {
                $index = $priority;
                last;
            }
        }
        if (defined $index) {
            push @{$queue[$index][1]}, $item;
        } else {
            push @queue, [$prio, [$item]];
            my $idx = $#queue;
            $add_to_queue->($idx);
        }
    };

    my $pull_highest_prio = sub  {
        return undef if $is_empty->();
        my $result = shift @{$queue[0][1]};
        $take_from_heap->() if scalar @{$queue[0][1]} == 0;
        return $result;
    };

    return $is_empty, $insert_with_prio, $pull_highest_prio;
}
my ($is_empty, $insert, $pull_prio) = new_queue;
# Testing the above code: 20 insertions and deletions
for my $num (1..20) {
    $insert->($num,
        $num % 10 == 0 ? 10 :
        $num % 5  == 0 ? 5  :
        $num % 3  == 0 ? 3  :
        $num % 2  == 0 ? 2  : 
        1);
} 
for my $num (1..30) {
    last if $is_empty->();
    say $pull_prio->();
}
say "Empty queue" if $is_empty->();

I won't repeat the output displayed on the screen when running this program, as it is exactly the same as what we had with the Perl 6 implementation. And I won't comment further on it: almost everything I said before about the P6 implementation holds with this P5 version, so please refer to the above explanations on the P6 version if you need.

Note however the John Macdonald (one of the authors of the book Mastering Algorithms with Perl published by O'Reilly) wrote a collection of heap modules available on the CPAN. You might want to try some of them, especially his implementation of binary heaps.

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, August 4. And, please, also spread the word about the Perl Weekly Challenge if you can.

And see you hopefully at the European Perl Conference in Riga, Aug. 7 to 9, 2019.

Perl Weekly Challenge # 18: Longest Common Substrings, Priority Queues, and a Functional Object System

These are some answers to the Week 18 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in several days from now (July 28, 2019). This blog post offers some solutions to this challenge, please don't read on if you intend to complete the challenge on your own.

Challenge # 1: Longest Common Substrings

Write a script that takes 2 or more strings as command line parameters and print the longest common substring. For example, the longest common substring of the strings “ABABC”, “BABCA” and “ABCBA” is string “ABC” of length 3. Other common substrings are “A”, “AB”, “B”, “BA”, “BC” and “C”. Please check this wiki page for details.

I can see at least two ways to tackle the problem (to simplify, let's say between two strings). One is to have two nested loops, one on the letters of the first string and the second one on the letters of the second string, and to store the substrings (or, possibly, the longest so far). The other is to generate all the substrings of each word and then to compare them. I'll use the first approach in Perl 5 and the second one in Perl 6 (because P6 has some functionalities making the second approach easy and interesting, and probably quite efficient).

Note that the programs below will consider only extended ASCII strings for simplicity. A couple of very minor changes might be needed for dealing properly with full Unicode strings.

Longest Common Substring in Perl 5

We loop over the input strings and, for each pair of input strings, use the compare2str subroutine to find the common substrings. The %common hash is used to store the common substrings (we use a hash because it automatically removes duplicate substrings (which we don't need) and it is easier and faster to filter the result in order to keep only substrings that are common to all input strings.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

sub compare2str {
    my ($str1, $str2) = @_;
    my @st1 = split //, $str1;
    my @st2 = split //, $str2;
    my %result;
    my $common = '';
    my ($i, $j) = (0, 0);
    while ($i <= $#st1) {
        while ($j <= $#st2) {
            if ($st1[$i] eq $st2[$j]) {
                $common .= $st1[$i];
                $result{$common} = 1;
                my ($k, $l) = ($i, $j);
                while (1) {
                    $k++; $l++; 
                    if ($k <= $#st1 and $l<= $#st2 
                        and $st1[$k] eq $st2[$l]) {
                            $common .= $st1[$k];
                            $result{$common} = 1;;
                    } else {
                        $common = '';
                        last;
                    }
                }             
            }
            $j++;
        }
        $j = 0;
        $i++;
    }
    return keys %result;
}

die "Must supply at least two strings\n" unless @ARGV >= 2;
my %common = map { $_ => 1 } compare2str shift, $ARGV[0];
while (@ARGV > 1) {
    %common = map { $_ => 1 } grep $common{$_}, 
        compare2str shift, $ARGV[0];
}   
my $max = "";
for (keys %common) {
    $max = $_ if length $_ > length $max;
}
say "Largest common substring: $max";

This works as expected. For example:

$ perl  substrings.pl ABABCTO BABCTO ABCTBA
Largest common substring: ABCT

However, it turns out the code is quite complicated for a relatively easy task.

Of course, we could use a CPAN module such as String::LCSS, but that would be cheating: we want to implement the algorithm ourselves. It is certainly quite possible to improve the code above here and there in order to make it more concise and possibly clearer, but I will not do that because I want to test a different approach. I'll come back to that after we've seen our Perl 6 solution.

Longest Common Substring in Perl 6

As mentioned earlier, our approach for the Perl 6 version is to generate all the unique substrings of each input string and then only keep the substrings that are common to all strings. It is then easy to find the longest substring.

To generate all the substrings of a given string, we could probably use the regex engine with the :exhaustive adverb, to get all the overlapping matches, but we'll rather use the rotor built-in subroutine, which isn't mentioned very often although it is very powerful and expressive.

These are two examples using rotor under the REPL:

> 'abcd'.comb.rotor(1);
((a) (b) (c) (d))
> 'abcd'.comb.rotor(2);
((a b) (c d))

In these examples, rotor groups the elements of the invocant into groups of 1 and 2 elements. We're a long way from generating all the substrings of a given string. But we can do better:

> say 'abcd'.comb.rotor($_) for 1..4;;
((a) (b) (c) (d))
((a b) (c d))
((a b c))
((a b c d))

This is already much better, but we're still missing some of the desired substrings such as bc and bcd.

The rotor method can take as parameter a key-value pair, whose value (the second item) specifies a gap between the various matches:

> (1..10).rotor(2 => 1)
((1 2) (4 5) (7 8))

As you can see, we obtain pairs of values, with a gap of 1 between the pairs (item 3, 6 and 9 are omitted from the list. Now, the gap can also be negative and, in that case, we get all successive pairs from the range:

> (1..10).rotor(2 => -1)
((1 2) (2 3) (3 4) (4 5) (5 6) (6 7) (7 8) (8 9) (9 10))

The rotor subroutine can in fact do much more than that (check the rotor documentation), but I've basically shown the features that we'll use here.

The other Perl 6 functionality that we will use here is the the Set type and the associated intersection ( or (&)) operator. This operator does exactly what set intersection do in the mathematical set theory: it returns the elements that are common to the two sets.

We can now code the largest common substring in Perl 6:

use v6;
use Test;

sub substrings (Str $in) {
    my @result = $in.comb;
    append @result,  map { .join('') }, $in.comb.rotor: $_ => 1-$_ for 2..$in.chars;
    return set @result;
}
sub largest-substring (@words) {
    my Set $intersection = substrings shift @words;
    while (my $word = shift @words) {
        $intersection ∩= substrings $word;
    }
    return $intersection.keys.max({.chars});
}
multi MAIN (*@words where *.elems > 1) {
    say largest-substring @words;
}
multi MAIN () {
    plan 2;
    my @words = <ABABC BABCA ABCBA>;
    cmp-ok largest-substring(@words), 'eq', 'ABC', "Testing 3 strings";
    @words = 'abcde' xx 5;
    cmp-ok largest-substring(@words), 'eq', 'abcde', "Testing identical strings";
}

Running the program with no argument to run the tests produces this:

$ perl6  substrings.p6
1..2
ok 1 - Testing 3 strings
ok 2 - Testing identical strings

And with the same parameters as our P5 implementation, we get the same result:

perl6  substrings.p6 ABABCTO BABCTO ABCTBA
ABCT

Longest Common Substring in Perl 5 - Reloaded

The P6 script doing the bulk of the actual work is only 12 lines of code, whereas the equivalent in our initial Perl 5 implementation is 31 lines. Even though the P6 version is using features that don't exist in P5, perhaps we can mimic the same approach:

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

sub substrings {
    my @chars = split //, shift;
    my %substr; # using a hash to remove duplicates
    for my $i (0..$#chars) {
        for my $j ($i..$#chars) {
            $substr{ join '', @chars[$i..$j] } = 1;
        }
    }
    return keys %substr;
}
my %result = map { $_ => 1} substrings shift;
for my $word (@ARGV) {
    %result = map {$_ => 1} grep $result{$_}, substrings $word;
}
my $max = 0;
for (keys %result) {
    $max = $_ if length $_ > length $max;
}
say "Largest common substring: $max";

This prints out:

$ perl  substrings.pl ABABCTO BABCTO ABCTBA
Largest common substring: ABCT

Now, the bulk of the actual work is done in 14 lines of code, less than half the original P5 implementation, and barely more that the P6 implementation. It is quite interesting: the features of P6 gave some ideas on how to solve the problem better in Perl 5.

Priority Queues

Write a script to implement Priority Queue. It is like regular queue except each element has a priority associated with it. In a priority queue, an element with high priority is served before an element with low priority. Please check this wiki page for more informations. It should serve the following operations:

  • 1) is_empty: check whether the queue has no elements.

  • 2) insertwithpriority: add an element to the queue with an associated priority.

  • 3) pullhighestpriority_element: remove the element from the queue that has the highest priority, and return it. If two elements have the same priority, then return element added first.

There are numerous ways to design simple priority queues (at least when performance is not an issue, for instance is the data set isn't very large). For example, it might be sufficient to maintain an array of arrays (AoA), where each of the arrays is a pair containing the item and associated priority. Or an array of hashes (AoH) based on the same idea. This means that each time we want to pull the highest priority element, we need to traverse the whole data structure to find the item with the highest priority. This may be quite slow when there are many items, but this may not matter if our data structure only has a few dozen items.

Another way is to build a hash of arrays (HoA), where the hash keys are the priority and the hash values references to arrays. When the number of priorities is relatively small (compared to the number of items in the queues), this tends to be more efficient, but note that we still need to traverse the keys of the hash until we find the highest priority. An AoA with the index being the priority and the sub-hashes the item might be more efficient (because the priorities remain sorted), but this requires the priorities to be relatively small positive integers. We still have to traverse the top data structure until we find the first non-empty sub-array. We will use this data structure for our Perl 5 implementation.

Another possibility is to use a heap, a data structure that usually has better performance (when it matters). We'll come back to that later.

Priority Queues in Perl 5 (Objects in Functional Programming)

A typical way of implementing queues with their various attached methods would obviously be with object-oriented programming (OOP), using either the standard Perl 5 object system, or any of the more modern variants such as Moose, Moo, Mouse, Mo, or whatever.

But since I'm not a typical person (and since, as those who have read some of my previous posts know by now, I like functional programming), I'll do it in an atypical way and build my objects with functional programming. As you'll see, it's quite simple. Defining my own light-weight object systems and my simple "queue class" takes just one subroutine with only 21 lines of code:

sub new_queue { 
    my @queue;  # an AoA
    my $is_empty = sub {
        for my $item (@queue) {
            next unless defined $item;
            return 0 if @$item > 0;
        }
        return 1;
    };
    my $insert_with_prio = sub {
        my ($item, $prio) = @_;
        push @{$queue[$prio]}, $item;
    };
    my $pull_highest_prio = sub {
        for my $item (reverse @queue) {
            next unless defined $item;
            return shift @$item if @$item > 0;
        }
    };
    return $is_empty, $insert_with_prio, $pull_highest_prio;
}

The new_queue subroutine creates an empty @queue object and returns three coderefs ($is_empty, $insert_with_prio, and $pull_highest_prio). The new_queue subroutine can be considered as a function factory which manufactures three coderefs for the benefit of the caller. Each of the three coderefs is an anonymous function that closes over the @queue AoA (in other words, they are closures).

The code that will call the new_queue "constructor" will only receive in return the three coderefs (let's call them "methods"). It will have no access whatsoever to the @queue object other than using these methods, so that's pretty good encapsulation.

Let's now look at the calling code:

my ($is_empty, $insert, $pull_prio) = new_queue;
for my $num (1..20) {   # inserting 20 items into the queue
    $insert->($num,
        $num % 10 == 0 ? 10 :
        $num % 5  == 0 ? 5  :
        $num % 3  == 0 ? 3  :
        $num % 2  == 0 ? 2  : 
        1);
}        
for my $num (1..20) {
    say $pull_prio->();
}
say "Empty queue" if $is_empty->();

The first code line is a call to the object constructor and the code retrieves the three coderefs (the methods).

The rest of the code above is just for testing purpose. We will insert into our queue 20 numbers (between 1 and 20) and assign them priorities as follows:

  • Priority 10 for numbers evenly divisible by 10;

  • Priority 5 for numbers evenly divisible by 5 (but not 10 and 20);

  • Priority 3 for numbers evenly divisible by 3 (but not 15);

  • Priority 2 for other even numbers;

  • Priority 1 for other numbers.

This is the structure of the @queue after insertion of the 20 items with their priority:

0  undef
1  ARRAY(0x6003018a0)
   0  1
   1  7
   2  11
   3  13
   4  17
   5  19
2  ARRAY(0x600301978)
   0  2
   1  4
   2  8
   3  14
   4  16
3  ARRAY(0x6007fdf90)
   0  3
   1  6
   2  9
   3  12
   4  18
4  undef
5  ARRAY(0x600842a88)
   0  5
   1  15
6  undef
7  undef
8  undef
9  undef
10  ARRAY(0x600842bf0)
   0  10
   1  20

So, when running this code, we should get the items in the following order: 10, 20, 5, 15, numbers evenly divided by 3 (except 15) in ascending order, etc. And we should finally get a "Empty queue" message if we have exhausted the queue. Let's see:

$ perl  queues.pl
10
20
5
15
3
6
9
12
18
2
4
8
14
16
1
7
11
13
17
19
Empty queue

Looks good, doesn't it?

Priority Queues in Perl 6

For the Perl 6 implementation, we want use a binary heap to store the data. But this requires quite a bit of explanations, and this blog post is going to get too long.

So, I'll stop here for now and prepare another blog post for that part in the next couple of days. Stay tuned if you want to know about priorities queues using binary heaps.

Update: This is the new post on heaps and priority queues in Perl 6 (and also in Perl 5)

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, August 4. And, please, also spread the word about the Perl Weekly Challenge if you can.

And see you hopefully at the European Perl Conference in Riga, Aug. 7 to 9, 2019.

Perl Weekly Challenge # 17: Ackermann Function and Parsing URLs

These are some answers to the Week 17 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in several days from now (July 21, 2019). This blog post offers some solutions to this challenge, please don't read on if you intend to complete the challenge on your own.

Challenge # 1: Ackermann Function

The common Ackermann function, named after Wilhelm Ackermann, is defined recursively as follows:

A(m, n) = n + 1                  if m = 0
A(m, n) = A(m - 1, 1)            if m > 0 and n = 0
A(m, n) = A(m - 1, A(m, n - 1))  if m > 0 and n > 0

The function grows very rapidly even for relatively small input values, so you may not want to try to compute it for input values such as 5 and 6. You probably don't even want to try it with the first input parameter (m) larger than 3. It also does a very large number of recursive calls, so that it will tend to be very slow.

And I don't think it is possible to cache the results or to memoize the function and obtain really significant performance enhancements, because, as far as I can say, the ack function is relatively rarely called more than twice or three times with the same arguments (except for some very small values). I tried anyway to memoize the P5 ack function, just in case my analysis was wrong, and did not get a large improvement (for some input values, the memoized version even ran slower). So, using the Memoize module may make the function slighly faster, but it will not really make it possible to call it witl larger arguments, because it would still require a truly huge number of computations.

Ackermann Function in Perl 6

This is a very simple implementation of the Ackermann function in Perl 6:

sub ack (Int $m, Int $n) {
    return $n + 1 if $m == 0;
    return ack $m - 1, 1 if $n == 0;
    return ack $m - 1, ack $m, $n-1;
}
say ack 3, 4;

This prints 125.

Note that we don't need parentheses for the ack subroutine calls, not even in the case of the double call on the third return statement: Perl 6 knows that ack requires two arguments and successfully manage to parse the calls correctly. That being said, you might prefer for clarity to add some parentheses on that last return statement (but remember there shouldn't be any space between the subroutine call and the opening parenthesis of the arguments):

    return ack($m - 1, ack($m, $n-1) );

OK, job done, it seems. But we can play with it a bit more and have some fun with some distinctive features of Perl 6.

Using Sigilless Variables

First, Perl 6 supports sigilless variables (I personally tend to prefer variables with sigils, but sigilless variables will be useful here), so that we can define our function almost exactly in the terms of the challenge requirement:

sub A (Int \m, Int \n) {
    return n + 1 if m == 0;
    return A(m - 1, 1) if n == 0; # m > 0 if we get here
    return A(m - 1, A(m, n - 1)); # m and n > 0 if we get here
}
say A 3, 4;

Note that the value of a sigilless variable (which is in fact an alias to the value assign to it) cannot be changed, but that's OK since we never modify either m or n, but only create new local versions of them on each recursive call.

Using Multi Subroutines

Perl 6 allows for writing several routines with the same name but different signatures. They are introduced with the multi keyword. We can use this feature to write three versions of the ack function, one for each of the conditionals:

use v6;
subset Positive of Int where * >= 0;

multi ack (0, Positive \n) { n + 1 }
multi ack (Positive \m, 0) { ack(m - 1, 1) }
multi ack (Positive \m, Positive \n) {
    ack(m - 1, ack(m, n - 1))
}
sub MAIN (Positive \m, Positive \n) {
    say ack +m, +n;
}

We now need to pass the values of m and n to the program:

$ perl6 ackermann2.pl6 3 4
125

I originally wanted to declare three multi MAIN subroutines (and that's feasible), but that turned out to be impractical in this case because I did not know where to put the print statement to display the result, which has to occur only after the cascade of recursive calls is completed.

Ackermann Function in Perl 5

Translating our initial P6 implementation into Perl 5 is very easy:

#!/usr/bin/perl
use strict;
use warnings;
no warnings 'recursion';
use feature qw/say/;

sub ack {
    my ($m, $n) = @_;
    return $n + 1 if $m == 0;
    return ack($m - 1, 1) if $n == 0;
    return ack($m - 1, ack($m, $n-1));
}
say ack 3, 4;

This prints again 125.

By the way, please note that the Perl 5 ack subroutine just above works perfectly fine in Perl 6. Although it is far more idiomatic to use signatures for managing subroutine parameters, it is still possible in Perl 6 to use the @_ array for retrieving the parameters passed to a subroutine. The P5 and P6 languages aren't so different, after all.

Also note that we had to suppress the recursion warning, because the recursion stack exceeds 100 calls with the arguments used in the example.

Challenge # 2: Parsing URLs

Create a script to parse URL and print the components of URL. According to Wiki page, the URL syntax is as below:

scheme:[//[userinfo@]host[:port]]path[?query][#fragment]

For example: jdbc:mysql://user:password@localhost:3306/pwc?profile=true#h1

scheme:   jdbc:mysql
userinfo: user:password
host:     localhost
port:     3306
path:     /pwc
query:    profile=true
fragment: h1

Parsing URLs in Perl 6

The right way to parse a URL is of course to write a grammar, and this is great fun in Perl 6.

I can't explain in detail how this works (as this would take many pages), but let me try to give an idea. First, we define a grammar, with a number of rules or tokens. When we use a grammar with the parse method on a string, the parse method calls by default the TOP rule (or token) and tries to match this rule:

rule TOP { 
    <scheme> '//' <authority>? <path> [ '?' <query> ]? <fragment>? 
}

This rules looks for a <scheme> component, followed by //, followed by an optional <authority> component, followed by a <path>component, etc. A <scheme> component is defined by the scheme token:

token scheme { \w+ [ ':' \w+]? ':'}

which is composed of a group of alphanumerical characters, followed by an optional colon and another group of alphanumerical characters, and ending with a colon.

Similarly, the optional <authority> component, if it exists, is defined by the authority token:

token authority { [<userinfo> '@']? <host> [':' <port>]? }

An authority component is itself composed of an optional <userinfo>and @, followed by a <host>, followed by an optional colon and a <port>. As you probably guessed by now, these three sub-components will be defined in other tokens, and so on.

In the end, if the string is successfully parsed, the result is stored into the $/ match object. We can then lookup for the various components with a hash-like syntax on the match object. So for example, the <scheme> component can be found in $/<scheme> or, in the code below, in $match<scheme>, since I assigned the parse result (the match object) to the $match variable.

Now the full code of the program:

use v6;
# use Grammar::Tracer;

grammar URL {
    rule TOP { 
        <scheme> '//' <authority>? <path> [ '?' <query> ]? 
        <fragment>? 
    }
    token scheme { \w+ [ ':' \w+]? ':'}
    token authority { [<userinfo> '@']? <host> [':' <port>]? }
    token userinfo { <user> [':' <password> ]?}
    token user { \w+ }
    token password { <-[ @ ]>+ }
    token host { 
        || <hostname>
        || <ipv4>
        || <ipv6>
    }
    token hostname { \w+ [ '.' \w+ ]* } 
    token ipv4 { <octet> ['.' <octet> ] ** 3 }
    token octet { (\d ** 1..3) <?{0 <= $0 <= 255 }>}
    token ipv6 { '[' <group> ** 8 % \: ']' }
    token group { <xdigit> ** 4 }
    token port { \d+ }
    token path { '/' <segment>? [ '/' <segment> ]* } 
    token segment { \w+ [ '::' \w+ ]? }
    token query {  \w+ '=' <[\w\s]>+ }
    token fragment { '#' <frag_id> }
    token frag_id { \w+ }
}
sub display (Str $label, Str $value) {
    printf "    %-15s:\t %-20s\n", $label, $value;
}

for ('jdbc:mysql://user:password@localhost:3306/pwc?profile=true#h1',
     'https://en.wikipedia.org/wiki/URL', 
     'https://perlcon.eu/my', 
     'https://www.perlmonks.org/?node=Seekers of Perl Wisdom',
     'https://metacpan.org/pod/Test::More'
    ) -> $url-string {
    my $match = URL.parse($url-string);
    if $match {
        say "Matched $url-string:";
        display "scheme", ~$match<scheme>;
        display "userinfo", ~$match<authority><userinfo> 
            if defined $match<authority><userinfo>;
        display "host", ~$match<authority><host> 
            if defined $match<authority><host>;
        display "port", ~$match<authority><port> 
            if defined $match<authority><port>;
        display "path", ~$match<path>;
        display "query", ~$match<query>
            if defined $match<query>;
        display "fragment", ~$match<fragment> 
            if defined $match<fragment>;
    } else {
        say "Not matched $url-string";
    }
}

And this is the output with our five test URLs:

perl6  parse_url.p6
Matched jdbc:mysql://user:password@localhost:3306/pwc?profile=true#h1:
    scheme         :         jdbc:mysql:
    userinfo       :         user:password
    host           :         localhost
    port           :         3306
    path           :         /pwc
    query          :         profile=true
    fragment       :         #h1
Matched https://en.wikipedia.org/wiki/URL:
    scheme         :         https:
    host           :         en.wikipedia.org
    path           :         /wiki/URL
Matched https://perlcon.eu/my:
    scheme         :         https:
    host           :         perlcon.eu
    path           :         /my
Matched https://www.perlmonks.org/?node=Seekers of Perl Wisdom:
    scheme         :         https:
    host           :         www.perlmonks.org
    path           :         /
    query          :         node=Seekers of Perl Wisdom
Matched https://metacpan.org/pod/Test::More:
    scheme         :         https:
    host           :         metacpan.org
    path           :         /pod/Test::More

Undoubtedly, many more tests would be needed, as URLs can take many forms and we tested only a few cases.

Parsing URLs in Perl 5

We don't have grammars in Perl 5, so we will use regexes. But regexes are far less powerful than grammars, so we will have to be much less ambitious: we will not really try to validate full URLs, but only try to extract the components.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

for my $url ( 'jdbc:mysql://user:password@localhost:3306/pwc?profile=true#h1',
    'https://en.wikipedia.org/wiki/URL', 
    'https://perlcon.eu/my',
    'https://www.perlmonks.org/?node=Seekers of Perl Wisdom') {

    $url =~ m{
        ^                       # start of string
        (\w+ (?: : \w+)?)       # scheme, captured in $1
        ://                     # literal ://
        (?:(\w+:\w+)@)?         # optional user info captured in $2
        (\w+ (?: \.\w+)*)       # host, captured in $3
        (?: : (\d+) )?          # optional port captured in $4
        (/(?:\w+ (?:/\w+)*)?)   # path, captured in  $5
        (?: \? (\w+=[\s\w]+))?  # optional query in $6
        (?: \# (\w+))?          # optional fragment in $7
        $                       # end of string
        }x;

    say "Matched $url:";
    say "   scheme: $1";
    say "   userinfo: $2" if defined $2;
    say "   host: $3";
    say "   port: $4" if defined $4;
    say "   path: $5";
    say "   query: $6" if defined $6;
    say "   fragment: $7" if defined $7;
}

Okay, this large regex may be a bit difficult to read and it would be quite hard to update it if any thing needs to be changed, but it works: this program displays the following output:

Matched jdbc:mysql://user:password@localhost:3306/pwc?profile=true#h1:
   scheme: jdbc:mysql
   userinfo: user:password
   host: localhost
   port: 3306
   path: /pwc
   query: profile=true
   fragment: h1
Matched https://en.wikipedia.org/wiki/URL:
   scheme: https
   host: en.wikipedia.org
   path: /wiki/URL
Matched https://perlcon.eu/my:
   scheme: https
   host: perlcon.eu
   path: /my
Matched https://www.perlmonks.org/?node=Seekers of Perl Wisdom:
   scheme: https
   host: www.perlmonks.org
   path: /
   query: node=Seekers of Perl Wisdom

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, July 28. And, please, also spread the word about the Perl Weekly Challenge if you can.

Perl Weekly Challenge # 16: Bitcoin Addresses

In this post dated July 8, I provided solutions to the first challenge of Week 16 of the Perl Weekly Challenge organized by Mohammad S. Anwar. At the time, I admitted that I was having trouble understanding the requirement for the second challenge. After quite a bit of googling, I can finally provide some answers to the second challenge of Week 16.

I should acknowledge that I spent far more time on this than I would wish to admit. Far more time than I ever did on any of the previous challenges.

Validation of Bitcoin Addresses

Write a script to validate a given bitcoin address. Most Bitcoin addresses are 34 characters. They consist of random digits and uppercase and lowercase letters, with the exception that the uppercase letter “O”, uppercase letter “I”, lowercase letter “l”, and the number “0” are never used to prevent visual ambiguity. A bitcoin address encodes 25 bytes. The last four bytes are a checksum check. They are the first four bytes of a double SHA-256 digest of the previous 21 bytes. For more information, please refer [wiki]'https://en.bitcoin.it/wiki/Address) page. Here are some valid bitcoin addresses:

1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVN2
3J98t1WpEZ73CNmQviecrnyiWrnqRhWNLy

I did not know anything about Bitcoins (and still don't), so I just tried to apply the rules described in the challenge. But it seemed that they were not complete, or maybe I missed something important. Anyway, I wasn't able at the time to even understand the requirement.

I tried to investigate the matter further.

Some parts of the code examples given below are partly copied or derived from other code samples found on the Internet. I readily admit that I did not design them entirely from scratch.

Base-58 encoding

The first thing to understand is that these bitcoin addresses are encoded in base-58. You need to decode them before you can split the bytes between the value itself and the checksum and apply a double SHA256 digest.

The characters used in this encoding are (in this specific order) :

123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz

i.e. alphanumerical characters except "O", "I", "l" and "0" (see the Wikipedia page on Base 58.

In Base 10 (using the 10 digits between 0 and 9), the first thing we need to do is to understand that a number such as 42 means:

4 * (10 ** 1) + 2 * (10 ** 0)

Similarly, in base 58, the "1BvB" string that starts our first valid address uses the following translation table:

1   =>  0
B   =>  10
v   =>  53
B   =>  10

Transforming such a base-58 string into a number goes like this:

0 * (58 ** 3) + 10 * (58 ** 2) + 53 * (58 ** 1) + 10 * (58 ** 0)

Note: the parentheses are not necessary (in Perl), but I suppose they make the thing clearer.

So, the first thing we need to do is to transform the input base-58 address into a binary object (an unsigned integer).

This will be a fairly large number (typically 58 or 59 digits). This is a number too large to be managed correctly in core Perl 5. So we will start with Perl 6 (which can handle arbitrary large integers), and see later how we can adapt the script in P5.

Caveat: I'm really new to bitcoins. The examples I provide below seem to work correctly, but, please, don't use the code examples below for any actual validation of bitcoin addresses. Use established libraries instead. There are probably edge cases where my code doesn't work properly. Also, a proper validation package should check the length of the input string, but I did not implement length validation because the information I found about that was somewhat contradictory.

Bitcoin Address Validation in Perl 6

This is my attempt at validating a bitcoin address in Perl 6:

use v6;
use Digest::SHA;
use Test;
plan 4;

my @base58 = "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz".comb;
    # https://en.wikipedia.org/wiki/Base58

my %base58 = map { @base58[$_] => $_}, 0 .. 57;
sub b58toraw (Str $address) {
    my UInt $num = 0;
    my $i = 0;
    for $address.comb.reverse -> $letter {
        $num += %base58{$letter} * 58**$i++;
    }
    my @bytes = $num.base(16).fmt("%050s").comb(2) ;
    my $buff = Buf.new(map {.fmt("%02s").parse-base(16)}, @bytes);
    return $buff;
}

sub check($address) {
    return False if $address ~~ /<[O0lI]>/;
    my $raw = b58toraw $address;
    my $four-last-bytes = $raw.subbuf(21, 4);
    my $double-digest = sha256( sha256 $raw.subbuf(0, 21));
    return  True if $double-digest.subbuf(0, 4) eq $four-last-bytes;
    False; 
}
ok  check("1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVN2"), "Correct Address 1";
nok check("1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVw2"), "Incorrect Address 1";
ok  check("3J98t1WpEZ73CNmQviecrnyiWrnqRhWNLy"), "Correct Address 2";
nok check("3k98t1WpEZ73CNmQviecrnyiWrnqRhWNLy"), "Incorrect Address 2";

which duly prints:

$ perl6 bitcoin.p6
1..4
ok 1 - Correct Address 1
ok 2 - Incorrect Address 1
ok 3 - Correct Address 2
ok 4 - Incorrect Address 2

In the event that you're interested, the value of the large $num integer for The first correct address is:

2936256236368367529205845895214681571805604556586060445291

and it is composed of the following bytes:

[00 77 BF F2 0C 60 E5 22 DF AA 33 50 C3 9B 03 0A 5D 00 4E 83 9A F4 15 76 6B]

Bitcoin Address Validation in Perl 5

In the course of my tests, I ended up with at least four versions of the subroutine to decode the base-58 input string.

One of them, which is loosely inspired from a CPAN module but does not work, used the Math::BigInt module to do the initial computing:

# CAUTION: does not work properly
sub decode_base58 {
    my $in = shift;
    my $decoded = Math::BigInt->new(0);
    my $multi   = Math::BigInt->new(1);
    my $base    = @base58;

    while (length $in > 0) {
        my $digit = chop $in;
        $decoded->badd($multi->copy->bmul($base58{$digit}));
        $multi->bmul($base);
    }
    return $decoded->to_base(256);
}

The code above simply does not compile because, for some reason, the version of Math::BigInt on my system does not appear to know the to_base (or, for that matter, the to_bytes) method:

"to_base" is not exported by the Math::BigInt module

I have no idea why these methods are unknown on my systems (I tried on both Linux and Windows), but I did not investigate much further and decided to write manually my own version of the calculation. To do this, we need to use an array for storing the decoded bytes. Whenever an item in the array becomes larger than a byte (more than 255), we normalize it to a byte (with the modulo operator) and carry over the excess to the next item.

#!/usr/bin/perl
use strict;
use warnings;
use Digest::SHA qw/sha256/;
use Test::More tests => 4;

my @base58 = split //, 
    "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz";
    # https://en.wikipedia.org/wiki/Base58
my %base58 = map { $base58[$_] , $_} 0 .. 57;

sub base58_to_bytes {
    use integer;
    my $input = shift;
    my @out;
    for my $letter ( split //, $input ) {
        $_ *= 58 for @out; # new letter, multiply previous values by the base
        $out[0] += $base58{$letter};
        for my $index ( 0 .. $#out ) {
            my $val = $out[$index];
            if ($val > 255) {
                $out[$index] = $val % 256;  # normalize current byte 
                $out[$index + 1] += $val / 256; # carry over to next
            }
        }
    }
    $out[$_] //= 0 for 0 .. 24;  # padding empty slots
    return reverse @out;
}
sub check_address {
    my $address = shift;
    die "Forbidden character" if $address =~ /[O0lI]/;
    my @byte = base58_to_bytes $address;
    return 0 unless 
        (pack 'C*', @byte[21..24]) eq 
        substr sha256(sha256 pack 'C*', @byte[0..20]), 0, 4;
    return 1;
}

is check_address("1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVN2"), 1, "Correct Address 1";
is check_address("1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVw2"), 0, "Incorrect Address 1";
is check_address("3J98t1WpEZ73CNmQviecrnyiWrnqRhWNLy"), 1, "Correct Address 2";
is check_address("3J99t1WpEZ73CNmQviecrnyiWrnqRhWNLy"), 0, "Incorrect Address 2";

The tests pass correctly:

$ perl  bitcoin2.pl
1..4
ok 1 - Correct Address 1
ok 2 - Incorrect Address 1
ok 3 - Correct Address 2
ok 4 - Incorrect Address 2

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, July 21. And, please, also spread the word about the Perl Weekly Challenge if you can.

# Perl Weekly Challenge # 16: Pythagoras Pie

These are some answers to the Week 16 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in several days from now (July 14, 2019). This blog post offers some solutions to this challenge, please don't read on if you intend to complete the challenge on your own.

Challenge # 1: Pythagoras Pie

At a party a pie is to be shared by 100 guest. The first guest gets 1% of the pie, the second guest gets 2% of the remaining pie, the third gets 3% of the remaining pie, the fourth gets 4% and so on. * Write a script that figures out which guest gets the largest piece of pie. (Challenge proposed by Jo Christian Oterhals)*

The first guest gets 1% of the whole pie. The second guest gets 2% of 99%, i.e. 1.98%. And so on: each guest gets in turn a larger share of what is left of the pie, but, at the same time, what is left of the pie gets smaller and smaller. So, intuitively, there must be a point where the share sizes start diminishing. If there was a 101st guest, she would get nothing, since the 100th guest took everything that is left (in fact a very tiny share).

Using a One-Liner Script to Display Each Guest's Share of the Pie

It is quite easy to write a Perl 5 one-liner to display the share of each guest. Given that the share become really minuscule after a while, I'll print the shares only for the first 50 guests. In the following one-liner script, $p represents each guest, $r the fraction of the pie that remains at any given point, and $sh is a constant representing the share. We display the guest number, what remains of the pie and the share taken by the guest:

$ perl -E '$r = 1; $sh = .01; for $p (1..50) {printf "%i\t%0.10f\t%0.10f\n", $p, $r, $r*$p*$sh; $r -=  $r*$p*$sh; }'
1       1.0000000000    0.0100000000
2       0.9900000000    0.0198000000
3       0.9702000000    0.0291060000
4       0.9410940000    0.0376437600
5       0.9034502400    0.0451725120
6       0.8582777280    0.0514966637
7       0.8067810643    0.0564746745
8       0.7503063898    0.0600245112
9       0.6902818786    0.0621253691
10      0.6281565096    0.0628156510
11      0.5653408586    0.0621874944
12      0.5031533642    0.0603784037
13      0.4427749605    0.0575607449
14      0.3852142156    0.0539299902
15      0.3312842254    0.0496926338
16      0.2815915916    0.0450546547
17      0.2365369369    0.0402112793
18      0.1963256577    0.0353386184
19      0.1609870393    0.0305875375
20      0.1303995018    0.0260799004
21      0.1043196015    0.0219071163
22      0.0824124852    0.0181307467
23      0.0642817384    0.0147847998
24      0.0494969386    0.0118792653
25      0.0376176733    0.0094044183
(Rest of the display omitted for brevity)

So, it turns out that the 10th guest gets the largest share of the pie (6.28%).

Of course, we can do more or less the same in Perl 6:

$ perl6 -e 'my $r = 1; for 1..50 -> $p {printf "%i\t%0.10f\t%0.10f\n", $p, $r, $r*$p*.01; $r -=  $r*$p*.01;}'
1       1.0000000000    0.0100000000
2       0.9900000000    0.0198000000
3       0.9702000000    0.0291060000
4       0.9410940000    0.0376437600
5       0.9034502400    0.0451725120
6       0.8582777280    0.0514966637
7       0.8067810643    0.0564746745
8       0.7503063898    0.0600245112
9       0.6902818786    0.0621253691
10      0.6281565096    0.0628156510
11      0.5653408586    0.0621874944
12      0.5031533642    0.0603784037
13      0.4427749605    0.0575607449
14      0.3852142156    0.0539299902
15      0.3312842254    0.0496926338
(Rest of the display omitted for brevity)

But we're cheating a bit with these one-liners. The challenge says: * Write a script that figures out which guest gets the largest piece of pie.* Our one-liners display the share and it is the human person reading the output that really figures out which share is the largest.

We could still do it with a one-liner, for example in Perl 5:

$ perl -E '$r = 1; $sh = .01; $max_sh = 0; for $p (1..100) { 
    ($max_p, $max_sh) = ($p, $r*$p*$sh) 
        if $r*$p*$sh > $max_sh; $r -= $r*$p*$sh
    } 
    say "$max_p\t$max_sh";'
10      0.0628156509555295

But I'm afraid this is now becoming a bit too hairy, it will be cleaner to write real scripts.

Real Scripts for Finding the Largest Share

Since we've already done it with one-liner, writing a full-fledged script in Perl 5 is not complicated:

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;
use constant SHARE_FACT => 0.01;

my $rest_of_the_pie = 1;
my ($max_guest, $max_sh) = (1, 0);
for my $guest_nr (1..100) {
    my $share = $rest_of_the_pie * $guest_nr * SHARE_FACT;
    ($max_guest, $max_sh) = ($guest_nr, $share) if $share > $max_sh;
    $rest_of_the_pie -= $share;
}
say "Lucky guest: $max_guest \tLargest share: $max_sh";

This prints the following output:

$ perl pythagoras_pie.pl
Lucky guest: 10         Largest share: 0.0628156509555295

We can directly translate the algorithm in Perl 6:

use v6;

constant $share-fact = 0.01;
my $rest-of-the-pie = 1;
my ($max-guest, $max-sh) = 1, 0;
for 1..100 -> $guest-nr {
    my $share = $rest-of-the-pie * $guest-nr * $share-fact;
    ($max-guest, $max-sh) = ($guest-nr, $share) if $share > $max-sh;
    $rest-of-the-pie -= $share;
}
say "Lucky guest: $max-guest \tLargest share: $max-sh";

But we can do something much more concise in Perl 6 using the sequence operator, functional programming and some built-in functions:

use v6;

my $rest = 1;
my @shares = map { my $sh = $rest * $_; $rest -= $sh; $sh}, 
    (0, .01 ... 1); 
say  map { $_, @shares[$_] }, @shares.keys.max({@shares[$_]});

which prints out:

((10 0.06281565095552947))

We start with the sequence operator ... to build a list of 101 relative shares and use a map statement to build @shares, the list of final shares of the original pie. Then we use the max routine to find the index of the largest value, and finally print the index and the value. Note that we started the original sequence with 0, although this is useless for the computations, because this makes it possible to use the array index as a rank (otherwise, the script would have printed 9, instead of 10, for the rank of the lucky guest).

Another approach is to build directly the @shares array with the sequence operator and a generator (i.e. a code block to generate the next item from the previous one):

use v6;

my ($rest, $a) = 1, .01;
my @shares = 0, .01, -> $b {$rest -= $b; $a += .01; $rest * $a} … *;
say  map {$_, @shares[$_]}, @shares[0..100].keys.max({@shares[$_]});

Validation of Bitcoin Addresses

Write a script to validate a given bitcoin address. Most Bitcoin addresses are 34 characters. They consist of random digits and uppercase and lowercase letters, with the exception that the uppercase letter “O”, uppercase letter “I”, lowercase letter “l”, and the number “0” are never used to prevent visual ambiguity. A bitcoin address encodes 25 bytes. The last four bytes are a checksum check. They are the first four bytes of a double SHA-256 digest of the previous 21 bytes. For more information, please refer [wiki]'https://en.bitcoin.it/wiki/Address) page. Here are some valid bitcoin addresses:

1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVN2
3J98t1WpEZ73CNmQviecrnyiWrnqRhWNLy

I do not know anything about Bitcoins, so I just tried to apply the rules described in the challenge. But it seems that they are not complete, or maybe I missed something important.

I'll try to investigate the matter further and to come back to this later in the week (Update: see this new post).

Functional Programming in Perl: Strong and Weak Primes (Perl Weekly Challenge)

In this other blog post, I provided some answers to Week 15 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Here, I want to use the opportunity of this challenge to illustrate once more some possibilities of functional programming in Perl (both Perl 5 and Perl 6) using the example of the first challenge of this week. I have already covered some aspects of functional programming in Perl in a post related to Weekly Challenge 9.

The challenge was about displaying the first 10 strong and weak prime numbers. A strong prime is a prime number that is greater than the arithmetic mean of the nearest prime above and below (in other words, it's closer to the following than to the preceding prime). A weak prime is a prime number that is less than the arithmetic mean of the nearest prime above and below. Obviously, a prime number cannot both strong and weak, but some prime numbers, such as 5 or 53 (we'll see more of them later), are neither strong, nor weak (they're called balanced primes): 5 is equal to the arithmetic mean of 3 and 7. Therefore, the fact that a prime is not strong doesn't mean that it is weak.

The Functional Solutions of my Initial Blog Post

The solutions I suggested in the other post on this challenge are in fact largely functional in spirit. One of them in Perl 6:

my @p = grep { .is-prime }, 1..*;   # Lazy infinite list of primes
say "Strong primes: ", (map { @p[$_] }, 
    grep { @p[$_] > (@p[$_ - 1] + @p[$_ + 1]) / 2 }, 1..*)[0..9];
say "Weak primes: ", (map { @p[$_] }, 
    grep { @p[$_] < (@p[$_ - 1] + @p[$_ + 1]) / 2 }, 1..*)[0..9];

And more or less the same idea in Perl 5:

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

sub is_prime{
    my $num = shift;
    for my $i (2 .. $num ** .5) {
        return 0 if $num % $i == 0;
    }
    return 1;
}
my @p = grep is_prime($_), 1..105;
my @strong = map $p[$_], 
    grep { $p[$_] - $p[$_-1] > $p[$_+1] - $p[$_] } 1..25;
my @weak = map $p[$_], 
    grep { $p[$_] - $p[$_-1] < $p[$_+1] - $p[$_] } 1..25;
say "Strong: @strong[0..9]";
say "Weak: @weak[0..9]";

In both cases, we're using a data pipeline programming model. The code lines with map and grep statements should be read from bottom to top (when they are formatted over more than one line) and from right to left. For example, to understand this code line:

my @strong = map $p[$_], 
    grep { $p[$_] - $p[$_-1] > $p[$_+1] - $p[$_] } 1..25;

one needs to start from the 1..25 range, which is fed to the grep statement, whose role is to filter the range values and keep those which satisfy the condition within the grep block. This means, in this case, to keep the indices of values in the @p array of prime numbers for which the current prime number is closer to the next prime than to the previous prime. These indices are then fed to the map statement in order to populate the @strong array with such primes.

Both Perl5 and Perl 6 implementations work fine, but there are a couple of weaknesses. One is that we're scanning the @p array of prime numbers twice. If we wanted to also identify the balanced primes, we probably would to traverse the list of primes a third time. The second weakness, at least for the Perl 5 implementation, is that is that it is not easy to figure out in advance the range of values on which we want to work. The Perl 6 implementation does not have this second problem because it uses a lazy infinite list of primes, meaning that Perl 6 will only generate the primes that we are requesting elsewhere in the rest of the program.

Note that these weaknesses don't matter very much, since we're dealing with small ranges anyway, but that's not really satisfactory to the mind, as we know that these programs wouldn't scale too well for larger ranges. Let's see whether we can improve these programs.

A Non Functional Implementation

An easy solution to both problems is to write a non (or less) functional solution using an infinite loop.

For example, in Perl 6, this could be something like this:

use v6;

my @p = grep { .is-prime }, 1..*;   # Lazy infinite list of primes
my (@strong, @weak, @balanced);
for 1..* -> $i {
    if @p[$i] > (@p[$i - 1] + @p[$i + 1])/2 { 
        push @strong, @p[$i];
    } 
    elsif @p[$i] < (@p[$i - 1] + @p[$i + 1])/2 {
        push @weak, @p[$i];
    } else {
        push @balanced, @p[$i];
    }
    last if @balanced.elems >= 10;
}
say "Strong primes: @strong[0..9]";
say "Weak primes: @weak[0..9]";
say "Balanced primes: @balanced[]";

This script produces the following output:

$ perl6 strong_primes.p6
Strong primes: 11 17 29 37 41 59 67 71 79 97
Weak primes: 3 7 13 19 23 31 43 47 61 73
Balanced primes: 5 53 157 173 211 257 263 373 563 593

Note that in the code above, we use the fact that we know from previous tests that balanced primes are much less frequent than either strong or weak primes, so that we can stop the for loop when we have 10 balanced primes. Also note that this works fine because, in Perl 6, a forloop is lazy. There may be some possible minor improvements, for example avoiding multiple dereferencing of the @p values (see e.g. the P5 script below), but I'm not really interested here with micro-optimizations.

From now on, we will drop the 4 boiler plate code lines at the P5 script's beginning (the use ... lines) to avoid repetition in each code example, but they are of course necessary in any Perl 5 script (except possibly some simple one-liners). This is the same idea in Perl 5 (with a slightly different algorithm):

sub is_prime{
    my $num = shift;
    for my $i (2 .. $num**.5) {
        return 0 if $num % $i == 0;
    }
    return 1;
}

my @p = grep is_prime($_), 2..600;
my (@strong, @weak, @balanced);
my $i = 0;
while (1) {
    $i++;
    my ($q, $r, $s) = @p[$i-1..$i+1]; 
    if ($r - $q > $s - $r) {
        push @strong, $r;
    } elsif ($r - $q < $s - $r) {
        push @weak, $r;
    } else {
        push @balanced, $r;
    }
    last if @balanced >= 10;
}
say "Strong primes: @strong[0..9]";
say "Weak primes: @weak[0..9]";
say "Balanced primes: @balanced";

These new P6 and P5 versions work fine, but that's really not the way I want to go: I would like to have a more functional version, not a less functional version. And, in the Perl 5 version, we still have to hard-code the range of the prime array.

Let's start with the specific Perl 5 problem, which is due to the fact that Perl 5 does not have built-in support for lazy infinite lists. As discussed in a post related to Weekly Challenge 9, we can still approach lazy infinite lists in Perl 5 with the help of iterators. I'll briefly explain again iterators here, but you might find more details about them by following the above link.

Anonymous Subroutines, Closures and Iterators In Perl 5

Rather than starting by populating an array of primes, we will use an iterator to produce the next wanted primes on demand.

Most programmers commonly use iterators, sometimes not knowing that it's called this way. For example, when you read a file line by line with a construct such as:

while (my $line = <$FH>) {
    # do something with $line
}

you're actually using an iterator.

An iterator is a function that returns values and keeps track of the last returned value to find out the next one. What we want here is a function that returns primes one by one, so that we don't need to compute values that are not needed. In our case, we would need a function that "remembers" the last prime it has found and "knows" how to find the next one.

In the program below, the generate_prime_iterator subroutine is a function generator that returns an anonymous subroutine; this anonymous subroutine acts as an iterator on successive primes. This anonymous subroutine is a closure that updates and keeps track of the last_val variable in the context of which the anonymous code reference is created.

The main code below calls the generate_prime_iterator subroutine once and stores the returned value (the anonymous code reference) into the $give_me_a_prime code reference. In the second line of the infinite while loop, the program calls the $give_me_a_prime code reference until it has enough primes (until $p[$i+1] is populated) for the current iteration. The first time through the while loop, the $give_me_a_prime code reference is called three times; in the following iterations, the program needs to call it only once, in order to get the next needed prime. The last statement near the end of the while loop breaks out of the infinite loop when there are enough items in the @balanced array.

sub is_prime{
    my $num = shift;
    for my $i (2 .. $num**.5) {
        return 0 if $num % $i == 0;
    }
    return 1;
}
sub generate_prime_iterator {
    my $last_val = shift // 1;
    return sub {
        do {
            $last_val ++;
        } until is_prime $last_val;
        return $last_val;
    }
}
my $give_me_a_prime = generate_prime_iterator 1;
my (@strong, @weak, @balanced, @p);
my $i = 0;
while (1) {
    $i++;
    push @p, $give_me_a_prime->() while not defined $p[$i+1];
    my ($q, $r, $s) = @p[$i-1..$i+1]; 
    if ($r - $q > $s - $r) {
        push @strong, $r;
    } elsif ($r - $q < $s - $r) {
        push @weak, $r;
    } else {
        push @balanced, $r;
        last if @balanced >= 10;
    } 
}
say "Strong primes: @strong[0..9]";
say "Weak primes: @weak[0..9]";
say "Balanced primes: @balanced";

This has become a bit more complicated than before, but we no longer need to estimate the number of primes we'll need, and we will not generate any useless prime. We've achieved more or less what we were doing in Perl 6 with a lazy infinite list. It just takes a bit more code to do that.

Categorizing or Classifying Primes In Perl 6

In the Perl 6 section of the initial blog post, we tried to use the categorize or classify built-in subroutines to avoid traversing several times the list of primes. The code was along these lines:

my @p = grep { .is-prime }, 1..*;   # Lazy infinite list of primes
sub mapper(UInt $i) {
    @p[$i] > (@p[$i - 1] + @p[$i + 1])/2 ?? 'Strong' !!
    @p[$i] < (@p[$i - 1] + @p[$i + 1])/2 ?? 'Weak'   !!
    'Balanced';
}
my %categories = classify &mapper, 1..120;
for sort keys %categories -> $key {
    say "$key primes:  ", map {@p[$_]}, %categories{$key}[0..9];
}

This worked fine, but we ran into the problem that the categorize and classify functions don't support infinite lists, so that we needed again to estimate the number of input values we would need to get the desired output.

Let's see if we can write our own version of classify which would have the same calling syntax and be able to handle infinite lists. Our version of the classify function will be called distribute.

my @p = grep { .is-prime }, 1..*;   # Lazy infinite list of primes
sub mapper(UInt $i) {
    @p[$i] > (@p[$i - 1] + @p[$i + 1])/2 ?? 'Strong' !!
    @p[$i] < (@p[$i - 1] + @p[$i + 1])/2 ?? 'Weak'   !!
    'Balanced';
}
sub distribute (&code, @primes) {
    my %distribution;
    for @primes.kv -> $key, $val {
        next if $key == 0;
        push %distribution{&code($key)}, $val;
        last if %distribution{'Balanced'}.elems >= 10;
    }
    return %distribution;
}
my %categories = distribute &mapper, @p;
for sort keys %categories -> $key {
    say "$key primes:  ", %categories{$key}[0..9];
}

Note that, contrary to the previous version, the %distribution and %categories hashes now contain the primes, not the prime indices in the prime array.

And we no longer need to estimate the number of input values to get the desired output.

I am not fully satisfied, though, because the stopping condition (last if ...) is hard-coded in the distribute subroutine, which is therefore very specific to the problem, whereas I would like the distribute subroutine to be more generic. Well, we can pass to distribute a third argument, another code block, to stop the iteration:

my @p = grep { .is-prime }, 1..*;   # Lazy infinite list of primes
sub mapper(UInt $i) {
    @p[$i] > (@p[$i - 1] + @p[$i + 1])/2 ?? 'Strong' !!
    @p[$i] < (@p[$i - 1] + @p[$i + 1])/2 ?? 'Weak'   !!
    'Balanced';
}
sub distribute (&code, @primes, &stopper) {
    my %distribution;
    for @primes.kv -> $key, $val {
        next if $key == 0;
        push %distribution{&code($key)}, $val;
        &stopper(%distribution);
    }
    return %distribution;
}
my $stopper = { last if %^a{'Balanced'}.elems >= 10 };
my %categories = distribute &mapper, @p, $stopper;
for sort keys %categories -> $key {
    say "$key primes:  ", %categories{$key}[0..9];
}

The only trick here if that the $stopper code block uses a self-declared positional parameter (or placeholder), %^a. And we pass the %distribution hash as a parameter to &stopper when we run it within the distribute subroutine. Thus, the calling code doesn't have to know the name of the hash within the distribute subroutine, which is now generic. To be frank, I wasn't convinced this would work until I ran it.

The output is what we want:

$ perl6 strong_primes.p6
Balanced primes:  (5 53 157 173 211 257 263 373 563 593)
Strong primes:  (11 17 29 37 41 59 67 71 79 97)
Weak primes:  (3 7 13 19 23 31 43 47 61 73)

Categorizing or Classifying Primes In Perl 5

Except for the last improvement made just above (using a self-declared parameter), we can do more or less the same in Perl 5:

my @p; # Array to store the primes
sub is_prime{
    my $num = shift;
    for my $i (2 .. $num**.5) {
        return 0 if $num % $i == 0;
    }
    return 1;
}
sub generate_prime_iterator {
    my $last_val = shift // 1;
    return sub {
        do {
            $last_val ++;
        } until is_prime $last_val;
        return $last_val;
    }
}
sub mapper {
    my $i = shift;
    $p[$i] > ($p[$i - 1] + $p[$i + 1])/2 ? 'Strong' :
    $p[$i] < ($p[$i - 1] + $p[$i + 1])/2 ? 'Weak'   :
    'Balanced';
}
sub distribute {
    my ($prime_generator, $mapper) = @_;
    my %distribution;
    my $k = 0;
    while (1) {
        $k++;
        push @p, $prime_generator->() while not defined $p[$k+1];
        push @{$distribution{$mapper->($k)}}, $p[$k];
        last if defined $distribution{Balanced} and 
            scalar @{$distribution{Balanced}} >= 10;
    }
    return %distribution;
}
my $give_me_a_prime = generate_prime_iterator 1;
my %categories = distribute $give_me_a_prime, \&mapper;
say "Strong primes: @{$categories{'Strong'}}[0..9]";
say "Weak primes: @{$categories{Weak}}[0..9]";
say "Balanced primes: @{$categories{Balanced}}[0..9]";

This works as intended. We cannot pass the stopper as a code block as in Perl 6, but there would be some work around if we wanted to. For example, the stopper block of Perl 6 could be passed as a string containing the code block and be evaled in the distribute subroutine. Another limitation is that the @p prime array is a global variable. This could be avoided if we wanted to, but I am not convinced that it makes sense to bend over backward and make the code significantly more complicated just for the sake of bureaucratic rules. In this case, the @p prime array is part of a caching strategy, it makes sense to have it a global variable.

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, July 14 (Bastille Day). And, please, also spread the word about the Perl Weekly Challenge if you can.

Perl Weekly Challenge # 15: Strong and Weak Primes and Vigenère Encryption

These are some answers to the Week 15 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in several days from now (July 7, 2019). This blog post offers some solutions to this challenge, please don't read on if you intend to complete the challenge on your own.

Challenge # 1: Strong and Weak Prime Numbers

Write a script to generate first 10 strong and weak prime numbers.

For example, the nth prime number is represented by p(n).

  p(1) = 2
  p(2) = 3
  p(3) = 5
  p(4) = 7
  p(5) = 11

  Strong Prime number p(n) when p(n) > [ p(n-1) + p(n+1) ] / 2
  Weak   Prime number p(n) when p(n) < [ p(n-1) + p(n+1) ] / 2

A strong prime is a prime number that is greater than the arithmetic mean of the nearest prime above and below (in other words, it's closer to the following than to the preceding prime). A weak prime is a prime number that is less than the arithmetic mean of the nearest prime above and below. Obviously, a prime number cannot both strong and weak, but some prime numbers, such as 5 or 53 (we'll see more of them later), are neither strong, nor weak (they're called balanced primes): 5 is equal to the arithmetic mean of 3 and 7. Therefore, the fact that a prime is not strong doesn't mean that it is weak.

For this challenge, I see some advantages to Perl 6 compared to Perl 5, so I'll start with Perl 6.

Strong and Weak Prime Numbers in Perl 6

We don't know in advance how many prime numbers we'll need to check to find 10 strong and 10 weak primes. This is a typical situation where using infinite lazy lists is very practical.

In the first code example below, we first build a lazy infinite list of prime numbers, and then use grep to filter the strong (and weak) primes, so as to construct lazy infinite lists of strong and weak primes, and finally print out the first 10 numbers of each such list. This is fairly straight forward:

use v6;

my @p = grep { .is-prime }, 1..*;   #Lazy infinite list of primes
my @strong = map { @p[$_] }, 
    grep { @p[$_] > (@p[$_ - 1] + @p[$_ + 1]) / 2 }, 1..*;
my @weak = map { @p[$_] }, 
    grep { @p[$_] < (@p[$_ - 1] + @p[$_ + 1]) / 2 }, 1..*;
say "Strong primes: @strong[0..9]";
say "Weak primes: @weak[0..9]";

This script displays the following output:

$ perl6 strong_primes.p6
Strong primes: 11 17 29 37 41 59 67 71 79 97
Weak primes: 3 7 13 19 23 31 43 47 61 73

We don't really need to build the intermediate @strong and @weak lazy infinite lists, but can print out the results directly:

use v6;

my @p = grep { .is-prime }, 1..*;   # Lazy infinite list of primes
say "Strong primes: ", (map { @p[$_] }, 
    grep { @p[$_] > (@p[$_ - 1] + @p[$_ + 1]) / 2 }, 1..*)[0..9];
say "Weak primes: ", (map { @p[$_] }, 
    grep { @p[$_] < (@p[$_ - 1] + @p[$_ + 1]) / 2 }, 1..*)[0..9];

This prints out the same lists as before:

perl6 strong_primes.p6
Strong primes: (11 17 29 37 41 59 67 71 79 97)
Weak primes: (3 7 13 19 23 31 43 47 61 73)

We're now down to three code lines instead of five (except that I'll probably have to format each of the two last lines over two lines to fit cleanly on this blog post).

Categorizing or Classifying Primes

One slight problem with the implementation above is that, once we have generated our list of primes, we need to go through it twice with the map ... grep chained statements, one for the strong primes and once for the weak primes; and we'd need to visit the prime list a third time for finding balanced primes. Although the script runs very fast, it would be better if we could do the categorizing in one go. Perl 6 has two built-in routines to do that, categorize and classify. Let's use the first one:

use v6;

my @p = grep { .is-prime }, 1..*;   # Lazy infinite list of primes
sub mapper(UInt $i) {
    @p[$i] > (@p[$i - 1] + @p[$i + 1])/2 ?? 'Strong' !!
    @p[$i] < (@p[$i - 1] + @p[$i + 1])/2 ?? 'Weak'   !!
    'Balanced';
}
my %categories = categorize &mapper, 1..120;
for sort keys %categories -> $key {
    say "$key primes:  ", map {@p[$_]}, %categories{$key}[0..9];
}

Running this program produces the following output:

$ perl6 strong_primes.p6
Balanced primes:  (5 53 157 173 211 257 263 373 563 593)
Strong primes:  (11 17 29 37 41 59 67 71 79 97)
Weak primes:  (3 7 13 19 23 31 43 47 61 73)

Here, we define a mapper subroutine to find out whether a given prime is strong, weak or balanced. Then, we pass to categorize two arguments: the subroutine and a list of subsequent integers (the indices of the @p prime number list) starting with 1 (the first prime cannot be weak or strong or balanced, since it has no predecessor) and store the result in the %categories hash, which is in fact a hash of arrays with three keys (one for each type of primes) and values being the index in the @p prime array of primes belonging to the corresponding type.

For example, with an input range of 1..30, the %categories hash has the following contents:

{ 
    Balanced => [2 15], 
    Strong => [4 6 9 11 12 16 18 19 21 24 25 27 30], 
    Weak => [1 3 5 7 8 10 13 14 17 20 22 23 26 28 29]
}

Remember that the numbers in the three lists above are not the primes, but the indices of the primes.

Then, the for loop extracts 10 numbers from each key of hash (with a full input range of 1..120).

This categorize built-in is very useful and practical for cases where you want to split some input data into different categories, but it isn't well adapted to our case in point, because it does not work with lazy lists. And since balanced primes are much less common than strong and weak primes, I was forced to use a relatively large range of 1..120 to make sure that I would get 10 balanced primes. For this specific problem, the classify built-in subroutine works essentially as categorize and also reports the Cannot classify a lazy list error message when trying to use it on a lazy infinite list. The difference between categorize and classify is that the latter returns a scalar whereas the former can return a list; so, in our example, it might have been slightly better to use classify rather than categorize, but the difference between the two built-ins is insignificant in our case.

I might come back to this issue in a later blog post (update: see this new post).

Strong and Weak Prime Numbers in Perl 5

In my blog post about Perl Weekly Challenge # 8 related to perfect numbers and Mersenne's numbers, and in a couple of other places before, I've shown a set of two somewhat complex subroutines (is_prime and find_primes) to generate relatively efficiently large primes with trial division. This time, we don't need large primes and can use a simpler (and less efficient, but sufficient) algorithm to check primality. This is done in the is_prime subroutine below.

As noted in the introduction above, a strong prime is closer to the following prime than to the preceding prime. So, rather than computing the arithmetic mean of the nearest prime above and below (as we did in our P6 implementation), we will this time use this alternate definition and compare the differences between a given prime and its preceding and succeeding one:

  Strong Prime p(n) when  [ p(n) - p(n-1) ] > [ p(n+1) - p(n) ]
  Weak   Prime p(n) when  [ p(n) - p(n-1) ] < [ p(n+1) - p(n) ]

Since Perl 5 does not support infinite lists as Perl 6, we need to specify some hard-coded ranges for our list or primes. Given that we've solved the problem in P6, it isn't difficult to figure out quite precisely the ranges that we need. If we had not done it before in P6, we would have had to choose somewhat larger ranges to be on the safe side of things.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

sub is_prime{
    my $num = shift;
    for my $i (2 .. $num ** .5) {
        return 0 if $num % $i == 0;
    }
    return 1;
}
my @p = grep is_prime($_), 2..105;
my @strong = map $p[$_], 
    grep { $p[$_] - $p[$_-1] > $p[$_+1] - $p[$_] } 1..25;
my @weak = map $p[$_], 
    grep { $p[$_] - $p[$_-1] < $p[$_+1] - $p[$_] } 1..25;
say "Strong: @strong[0..9]";
say "Weak: @weak[0..9]";

This script displays the same primes as in P6:

$ perl strong_primes.pl
Strong: 11 17 29 37 41 59 67 71 79 97
Weak: 3 7 13 19 23 31 43 47 61 73

Challenge # 2: Vigenère Encryption

Write a script to implement Vigenère cipher. The script should be able encode and decode. Checkout [wiki page](https://en.wikipedia.org/wiki/Vigen%C3%A8re_cipher] for more information.

The Vigenère cipher is actually a misnomer: in the nineteenth century, it has been mis-attributed to French diplomat and cryptographer Blaise de Vigenère, who published the method in 1586, and this is how it acquired its present name. But the method had been described more than three decades earlier (in 1553) by Italian cryptanalyst Giovan Battista Bellaso. It essentially resisted all attempts to break it until 1863, three centuries later.

To understand the Vigenère cipher, we can first consider what is known as the Caesar cipher, in which each letter of the alphabet is shifted along some number of places. For example, in a Caesar cipher of shift 3, A would become D, B would become E, Y would become B and so on. So, for instance, "cheer" rotated by 7 places is "jolly" and "melon" rotated by -10 (or + 16) is "cubed". In the movie A Space Odyssey, the ship's computer is called HAL, which is IBM rotated by -1. One famous such cipher is ROT13, which is a Caesar cipher with rotation 13. Since 13 is half the number of letters in our alphabet, applying rotation 13 twice returns the original message, so that the same procedure can be used for both encoding and decoding in rotation 13. Rotation 13 has been used very commonly on the Internet to hide potentially offensive jokes or to weakly hide the solution to a puzzle.

A Caesar cipher is very easy to break through letter frequency analysis.

In Edgar Allan Poe’s short story The Gold Bug, one of the characters, William Legrand, uses letter frequencies to crack a cipher. He explains:

Now, in English, the letter which most frequently occurs is e. Afterwards, the succession runs thus: a o i d h n r s t u y c f g l m w b k p q x z. E however predominates so remarkably that an individual sentence of any length is rarely seen, in which it is not the prevailing character.

Edgar Poe's character is slightly wrong on part of the succession of letters: for example, he grossly underestimated the frequency of letter t, which is the second most common letter in English. But what he says about letter E is correct.

So, if you want to decipher a message encoded with a Caesar cipher in English, one way is to find out the most common letter in the encoded text, and that most common letter is likely to be an E. From there, you can figure out by which value each letter has shifted and decipher the whole message. If you were unlucky, just give a try with the second most common letter, and then the third. You're very likely to succeed very quickly. Another possibility is brute force attack by trying all 26 possible values by which the letter are shifted. This is easy by hand, and very fast with a computer. A Caesar cipher is a very weak encryption system.

The idea of the Vigenère cipher is to shift the letters of the message by a different number of places. If your encryption code is 1452, you rotate the first letter by one place, the second one by 4 places, the third by 5 places, the fourth by 2 places; if you have more letters to encode in your message, then your start again with the beginning of the code, and so on. For example, if you want to encode the word "peace," you get:

p + 1 => q
e + 4 => i
a + 5 => f
c + 2 => e
e + 1 => f
Encoded message: qifef.

In brief, a Vigenère cipher is using a series of interwoven Caesar ciphers. With such a system, frequency analysis becomes extremely difficult because, as we can see in the example above, the letter E is encoded into I in the first instance, and into F in the second instance. In fact, if the encryption key is a series of truly random bytes and is at least as long as the message to be encoded (and is used only once), the code is essentially unbreakable. In practice, a Vigenère cipher is not using a number as encryption key, but generally a password or a pass-phrase: the letters of the password are converted to a series of numbers according to their rank in the alphabet and those numbers are used as above to rotate the letters of the message to be encoded. Since the encryption code is probably no longer truly random, it becomes theoretically possible to break the code, but this is still very difficult, and that's the reason the Vigenère cipher has been considered unbreakable for about three centuries.

Vigenère Cipher in Perl 6

For this challenge, we will use the built-in functions ord, which converts a character to a numeric code (Unicode code point), and chr which converts such numeric code back to a characters. Letters of the alphabet are encoded in alphabetic order, so that, for example:

say ord('c') - ord('a'); 2

because 'c' is the second letter after 'a'.

Originally, I kept letters within the a..z range (folding the input message to lowercase), because the numeric codes for uppercase letters are different, in order to keep as close as possible to the original Vigenère cipher. But the original cipher was limited to this range only because of the way encoding was done manually at the time. With a computer, there is no reason to limit ourselves to such range. So, the script below use the full range of an octet (0..255), i.e. the full extended ASCII range. This way we can also encode spaces, punctuation symbols, etc. Of course, this implies that the partner uses the script (or, at least, same algorithm).

In this script, the bulk of the work is done in the rotate-msg and rotate-one-letter subroutines. The encode and decode subroutines are only calling them with the proper arguments. And the create-code subroutine is used to transform the password into an array of numeric values.

use v6;

subset Letter of Str where .chars == 1;

sub create-code (Str $passwd) {
    # Converts password to a list of numeric codes
    # where 'a' corresponds to a shift of 1, etc.
    return $passwd.comb(1).map: {.ord - 'a'.ord + 1}
}
sub rotate-one-letter (Letter $letter, Int $shift) {
    # Converts a single letter and deals with cases 
    # where applying the shift would get out of range
    constant $max = 255;
    my $shifted = $letter.ord + $shift;
    $shifted = $shifted > $max ?? $shifted - $max !!
        $shifted < 0 ?? $shifted + $max !!
        $shifted;
    return $shifted.chr;
}
sub rotate-msg (Str $msg, @code) {
    # calls rotate-one-letter for each letter of the input message
    # and passes the right shift value for that letter
    my $i = 0;
    my $result = "";
    for $msg.comb(1) -> $letter {
        my $shift = @code[$i];
        $result ~= rotate-one-letter $letter, $shift;
        $i++;
        $i = 0 if $i >= @code.elems;
    }
    return $result;
}
sub encode (Str $message, @key) {
    rotate-msg $message, @key;
}
sub decode (Str $message, @key) {
    my @back-key = map {- $_}, @key;
    rotate-msg $message, @back-key;
}
multi MAIN (Str $message, Str $password) {
    my @code = create-code $password;
    my $ciphertext = encode $message, @code;
    say "Encoded cyphertext: $ciphertext";
    say "Roundtrip to decoded message: {decode $ciphertext, @code}";
}
multi MAIN ("test") {
    use Test; # Minimal tests for providing an example
    plan 6;
    my $code = join "", create-code("abcde");
    is $code, 12345, "Testing create-code";
    my @c = create-code "password";
    for <foo bar hello world> -> $word {
        is decode( encode($word, @c), @c), $word, 
            "Round trip for $word";
    }
    my $msg = "One small step for man, one giant leap for mankind!";
    my $ciphertext = encode $msg, @c;
    is decode($ciphertext, @c), $msg, 
        "Message with spaces and punctuation";
}

In the script above, we have two MAIN multi subroutines. When the single argument is "test", the script runs a series of basic tests (which would probably have to be expanded in a real life project); when the arguments are two strings (a message to be encoded and a password), the script runs with the input arguments.

This is an example run with the "test" argument:

$ perl6  vigenere.p6 test
1..6
ok 1 - Testing create-code
ok 2 - Round trip for foo
ok 3 - Round trip for bar
ok 4 - Round trip for hello
ok 5 - Round trip for world
ok 6 - Message with spaces and punctuation

and with two arguments:

$ perl6  vigenere.p6 AlphaBeta password
Encoded cyphertext: Qmƒ{xQwxq
Roundtrip to decoded message: AlphaBeta

Vigenère Cipher in Perl 5

Translating this P6 script into Perl 5 is fairly easy. Just as in P6, we will use the built-in functions ord, which converts a character to a numeric code (Unicode code point), and chr which converts such numeric code back to a characters.

In this script, the bulk of the work is done in the rotate_msg and rotate_one_letter subroutines. The encode and decode subroutines are only calling them with the proper arguments. And the create_code subroutine is used to transform the password into an array of numeric values.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;


sub create_code {
    # Converts password to a list of numeric codes
    # where 'a' corresponds to a shift of 1, etc.
    my $passwd = shift;
    return map { ord($_) - ord('a') + 1 } split //, $passwd;
}
sub rotate_one_letter {
    # Converts a single letter and deals with cases where 
    # applying the shift would get the result out of range
    my ($letter, $shift) = @_;
    my $max = 255;
    my $shifted = $shift + ord $letter;
    $shifted = $shifted > $max ? $shifted - $max :
        $shifted < 0 ? $shifted + $max :
        $shifted;
    return chr $shifted;
}
sub rotate_msg {
    # calls rotate-one-letter for each letter of the input message
    # and passes the right shift value for that letter
    my ($msg, @code) = @_;
    my $i = 0;
    my $result = "";
    for my $letter (split //, $msg) {
        my $shift = $code[$i];
        $result .= rotate_one_letter $letter, $shift;
        $i++;
        $i = 0 if $i >= @code;
    }
    return $result;
}
sub encode {
    my ($message, @key) = @_;
    rotate_msg $message, @key;
}
sub decode  {
    my ($message, @key) = @_;
    my @back_key = map {- $_} @key;
    rotate_msg $message, @back_key;
}
sub run_tests {
    use Test::More; # Minimal tests for providing an example
    plan tests => 6; # needed on a separate code line to avoid 
                     # useless output when not running the tests
    my $code = join "", create_code("abcde");
    is $code, 12345, "Testing create_code";
    my @c = create_code "password";
    for my $word ( qw/foo bar hello world/) {
        is decode( encode($word, @c), @c), $word, 
            "Round trip for $word";
    }
    my $msg = "One small step for man, one giant leap for mankind!";
    my $ciphertext = encode $msg, @c;
    is decode($ciphertext, @c), $msg, 
        "Message with spaces and punctuation";
}


if (@ARGV == 1 and $ARGV[0] eq "test") {
    run_tests;
} elsif ( @ARGV == 2) {
    my ($message, $password) = @ARGV;
    my @code = create_code $password;
    my $ciphertext = encode $message, @code;
    say "Encoded cyphertext: $ciphertext";
    say "Roundtrip to decoded message: ", decode $ciphertext, @code;
} else {
    say "Wrong arguments";
}

Asides from the small syntax adjustments between P6 and P5, the main difference is that core Perl 5 doesn't have a MAIN subroutine and does not have multiple dispatch for subroutines. So, I just mimicked the P6 version by checking the arguments passed to the script and calling the run_test subroutine when needed.

This gives the same output as the P6 script:

$ perl vigenere.pl test
1..6
ok 1 - Testing create_code
ok 2 - Round trip for foo
ok 3 - Round trip for bar
ok 4 - Round trip for hello
ok 5 - Round trip for world
ok 6 - Message with spaces and punctuation

Laurent@LAPTOP-LHI8GLRC ~
$ perl vigenere.pl TangoCharlieJuliet password
Encoded cyphertext: dbz†Rze‚m|xa„~muu
Roundtrip to decoded message: TangoCharlieJuliet

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, July 14. And, please, also spread the word about the Perl Weekly Challenge if you can.

About laurent_r

user-pic I am the author of the "Think Perl 6" book (O'Reilly, 2017) and I blog about Perl (5 and 6).