Perl Weekly Challenge # 21: Euler's Number and URL Normalizing

These are some answers to the Week 21 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in several days from now (August 18, 2019). This blog post offers some solutions to this challenge, please don't read on if you intend to complete the challenge on your own.

Challenge # 1: Euler's Number

Write a script to calculate the value of e, also known as Euler’s number and Napier’s constant. Please checkout this wiki page) for more information.

The number e is a mathematical constant that is the base of natural logarithms: It is the unique number whose natural logarithm is equal to 1 and its value is approximately 2.71828.

Euler's Number in Perl 6

Perl 6 has a built-in constant, e, for Euler's number, which we can just print:

$ perl6 -e 'say e'
2.718281828459045

So, job done? Well, maybe it is sort of cheating. Let's try to compute e really.

Let's try the formula used by Jacob Bernoulli in 1683: e is equal to the limit, when n tends to infinity, of (1 + 1/n)**n. We can just use this formula with a large input number:

use v6;

sub eul ($n) { (1 + 1/$n)**$n}

sub MAIN (Int $n) {
    say eul $n;
}

Let's try to run this program with increasing input numbers:

$perl6  euler.p6 5
2.48832

$perl6  euler.p6 10
2.5937424601

$perl6  euler.p6 100
2.7048138294215263

$perl6  euler.p6 1000
2.7169239322358925

$perl6  euler.p6 10000
2.718145926825225

It works, but the formula converges very slowly: with an input number of 10,000, we obtain only 4 correct digits. Let's try with a better formula. Euler's constant is equal to the sum, for n from 0 to infinity, of 1/n!, where n! is the factorial of n, i.e. the product of all positive integers between 1 and n.

For computing this, we will first define a new postfix operator, !, to compute the factorial of any number, and then use it to compute the sum. For this, we will use twice the [...] reduction metaoperator, which reduces a list of values with the given infix operator. For example,

say [+] 1, 2, 3, 4;   #  -> 10

is equivalent to:

say 1 + 2 + 3 + 4;

i.e. works as if the infix operator (+ in this example) was placed between each item of the list to produce an arithmetic expression yielding a single numerical value. This is the perfect functionality for computing both the factorial of an integer and the sum of terms of the formula.

use v6;

sub postfix:<!> (Int $n) {   # factorial operator
    [*] 2..$n;
}
sub eul (Int $n) {
    [+] map { 1 / $_! }, 0..$n;
}
sub MAIN (Int $n) {
    say eul $n;
}

The version with this new formula converges much faster than the original one:

$ perl6  euler.p6 10
2.7182818

$ perl6  euler.p6 100
2.718281828459045

Euler's Number in Perl 5

We don't have a builtin constant for e in Perl 5, but we can cheat almost as easily as in Perl 6:

$ perl -E 'print exp 1'
2.71828182845905

But, of course, we don't want to cheat: that wouldn't be a real challenge.

We've seen with our Perl 6 coding experiments that computing Euler's constant as the limit of the sum, for n from 0 to infinity, the terms 1/n! is quite efficient, as the result converges rather fast. We'll use the same formula in Perl 5.

Although Perl 5 does not allow the construction of new operators (such as the factorial ! operator in our P6 script) and does not have the [] reduction metaoperator, we can easily write subroutines for the same purposes.

Although I don't use them very commonly (because I am stuck on several servers at $work with old versions of Perl), I'll use here the subroutine signatures feature.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw /say signatures/;
no warnings 'experimental::signatures';

sub fact ($n) {
    my $fact = 1;
    $fact *= $_ for 2..$n;
    return $fact;
}
sub eul ($n) {
    my $euler;
    $euler += 1 / fact $_ for 0..$n;
    return $euler;
}

say eul shift;

This works as expected and produces the following output:

$ perl euler.pl 10
2.71828180114638

$ perl euler.pl 100
2.71828182845905

The first test with input 10 displays 8 correct digits, and all digits are correct in the second test with input 100.

URL Normalization

Write a script for URL normalization based on rfc3986. This task was shared by Anonymous Contributor.

According to Wikipedia, URL normalization is the process by which URLs are modified and standardized in a consistent manner. The goal of the normalization process is to transform a URL into a normalized URL so it is possible to determine if two syntactically different URLs may be equivalent.

URL normalization does not appear to be a well normalized process. Some of the changes may be useful for some purposes and unwanted in others. In the scripts suggested below, I have limited the changes to normalizations that preserve semantics plus removing dot-segments among the normalizations that usually preserve semantics. Other normalization rules are often unwanted or poorly defined.

To summarize, we will perform the following normalization actions:

  • Converting to lower case,
  • Capitalizing letters in escape sequences,
  • Decoding percent-encoded octets of unreserved characters,
  • Removing the default port,
  • Removing dot-segments.

URL Normalization in Perl 6

We will simply apply a series of successive regex substitutions to the URL, one (or in one case two) for each of the normalization actions.

In the normalize subroutine of the program below, we topicalize the URL (with the given keyword), so that we can use directly the regex substitution operator on the topical $_ variable. This simplifies the substitutions. We can write simply:

s:g/'/./'/\//;

instead of having to write, for each of the substitutions, something like:

$url ~~ s:g/'/./'/\//;

Each of the substitutions in the program below is commented to explain to which normalization action it refers to.

use v6;
use Test;

sub normalize (Str $url is copy) {
    constant $unreserved = (0x41..0x5A, 0x61..0x7A, 0x2D, 0x2E, 0x5F, 0x7E).Set;
    given $url {
        s:g/(\w+)/{lc $0}/;      # Lowercase letters
        s:g/('%'\w\w)/{uc $0}/;  # Capitalizing letters in escape sequences
        s:g/'%'(<xdigit>**2)     # Decoding percent-encoded octets
           <?{ (+"0x$0") (elem) $unreserved }> # code assertion
           /{:16(~$0).chr}/;
        s/':' 80 '/'/\//;        # Removing default port
        s:g/'/../'/\//;          # Removing two-dots segments
        s:g/'/./'/\//;           # Removing dot segments
    }
    return $url;
}

plan 5;
for < 1 HTTP://www.Example.com/              
        http://www.example.com/
      2 http://www.example.com/a%c2%b1b      
        http://www.example.com/a%C2%B1b
      3 http://www.example.com/%7Eusername/  
        http://www.example.com/~username/
      4 http://www.example.com:80/bar.html   
        http://www.example.com/bar.html
      5 http://www.example.com/../a/../c/./d.html 
        http://www.example.com/a/c/d.html
    > -> $num, $source, $target {
        cmp-ok normalize($source), 'eq', $target, "Test $num";
}

The five test cases work fine:

$ perl6  normalize_url.p6
1..5
ok 1 - Test 1
ok 2 - Test 2
ok 3 - Test 3
ok 4 - Test 4
ok 5 - Test 5

The decoding percent-encoded octets is a bit more complicated than the others and it might help to explain it a bit further. The first line:

    s:g/'%'(<xdigit>**2)     # Decoding percent-encoded octets

looks for a literal % character followed by two hexadecimal digits. But the match really occurs only if the code assertion immediately thereafter:

       <?{ (+"0x$0") (elem) $unreserved-range }> # code assertion

is successful, that is essentially if the two hexadecimal digits found belong to the $unreserved set of unreserved characters populated at the top of the subroutine. As a result, the substitution occurs only for the octets listed in that set.

Here, we have used five test cases, one for each of the normalization actions, because we don't have detailed specifications, but a real test plan would require more test cases based on actual specs.

URL Normalization in Perl 5

As for the P6 version, we will apply a series of successive regex substitutions to the URL, one (or in one case two) for each of the normalization actions.

In the normalize subroutine of the program below, we topicalize the URL (with the for keyword), so that we can use directly the regex substitution operator on the topical $_ variable. This simplifies the substitutions. We can write simply:

s{/\./}{/}g;

instead of having to write, for each of the substitutions, something like:

$url =~ s{/\./}{/}g;;

Each of the substitutions in the program below is commented to explain to which normalization action it refers to.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw /say/;
use Test::More tests => 5;

sub normalize {
    my $url = shift;
    my %unreserved = map {$_ => 1} 0x41..0x5A, 0x61..0x7A, 0x2D, 0x2E, 0x5F, 0x7E;
    for ($url) {
        s/(\w+)/lc $1/ge;    # Lowercase letters
        s/(%\w\w)/uc $1/ge;  # Capitalizing letters in escape sequences
        # Decoding percent-encoded octets
        if (/%([[:xdigit:]]{2})/ and exists $unreserved{hex $1} ) {
            s/%([[:xdigit:]]{2})/chr hex "0x$1"/xge;
        }
        s{:80/}{/};          # Removing default port
        s{/\.\./}{/}g;       # Removing two-dots segments
        s{/\./}{/}g;         # Removing dot segments
    }
    return $url;
}

for ( [ 1, 'HTTP://www.Example.com/',              'http://www.example.com/' ],
      [ 2, 'http://www.example.com/a%c2%b1b',      'http://www.example.com/a%C2%B1b' ], 
      [ 3, 'http://www.example.com/%7Eusername/',  'http://www.example.com/~username/' ],
      [ 4, 'http://www.example.com:80/bar.html',   'http://www.example.com/bar.html' ],
      [ 5, 'http://www.example.com/../a/../c/./d.html', 'http://www.example.com/a/c/d.html' ]
    ) { 
        my ($num, $source, $target) = @$_;
        is normalize($source),  $target, "Test $num";
}

The five test cases work fine:

 $ perl normalize_url.pl
 1..5
 ok 1 - Test 1
 ok 2 - Test 2
 ok 3 - Test 3
 ok 4 - Test 4
 ok 5 - Test 5

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, August 25. And, please, also spread the word about the Perl Weekly Challenge if you can.

Perl Weekly Challenge # 20: Split String on Character Change and Amicable Numbers

These are some answers to the Week 20 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in several days from now (August 11, 2019). This blog post offers some solutions to this challenge, please don't read on if you intend to complete the challenge on your own.

Challenge # 1: Split String on Character Change (P5 and P6)

Write a script to accept a string from command line and split it on change of character. For example, if the string is "ABBCDEEF", then it should split like "A", "BB", "C", "D", "EE", "F".

For this, it seemed fairly obvious to me that a simple regex in a one-liner should do the trick. Well, it turned out to be slightly more complicated that I anticipated in Perl 5. For example, running this very simple Perl 5 one-liner:

$ perl -E 'say join " ", "ABBCDEEF" =~ /((.)\2*)/g;'
A A BB B C C D D EE E F F

does more or less the splitting job correctly, but does not provide the desired output: we get one unwanted extra field for each wanted field. We can decide to filter out the unwanted fields:

$ perl -E 'my @a = "ABBCDEEF" =~ /((.)\2*)/g; say join " ", map $a[$_], grep {not $_ % 2} 0..$#a;'
A BB C D EE F

$ perl -E 'my @a = "ABBBCDEEF" =~ /((.)\2*)/g; say join " ", map $a[$_], grep {$_ % 2 == 0} 0..$#a;'
A BBB C D EE F

That seems to work fine.

But there is in fact a simpler way to do it. The reason for the repeated fields is that we have two pairs of capturing parentheses, and we need both of them for the regex to work properly. But we can easily print only one of the captures (i.e. only $1):

$ perl -E 'print "\"$1\" " while "ABBCDEEF" =~ /((.)\2*)/g;'
"A" "BB" "C" "D" "EE" "F"

For some reason, my original P5 try works fine with Perl 6 (with the necessary syntax changes) without having to filter out anything, as shown below:

$ perl6 -e 'say ~$/ if "ABBBCDEEF" ~~ m:g/((.)$0*)/;'
A BBB C D EE F

$ perl6 -e 'say ~$/ if "ABBCDEEF" ~~ m:g/((.)$0*)/;'
A BB C D EE F

Challenge # 2: Amicable Numbers

Write a script to print the smallest pair of Amicable Numbers. For more information, please checkout wikipedia page.

Amicable numbers are two different numbers so related that the sum of the proper divisors of each is equal to the other number. (A proper divisor of a number is a positive factor of that number other than the number itself. For example, the proper divisors of 6 are 1, 2, and 3.)

Amicable Numbers in Perl 5

We'll use the sum_divisors subroutine to find the divisors of a given number and return their sum. Then, we just loop over integers from 2 onward and call sum_divisors subroutine. If the sum of divisors is larger than the integer being examined (if it is smaller, then it is a number that we have already checked), then we check the sum of divisors of the sum of divisors. If it is equal to the integer, then we've found two amicable numbers.

#!/usr/bin/perl
use strict;
use warnings;
use feature "say";

sub sum_divisors {
    my $num = shift;
    my $limit = int $num / 2 ;
    my $sum = 1;
    for my $test_div (2..$limit) {
        $sum += $test_div if $num % $test_div == 0;
    }
    return $sum;
}

my $i = 2;
while (1) {
    my $sum_div = sum_divisors $i;
    if ($sum_div > $i and $i == sum_divisors $sum_div) {
        say "$i and $sum_div are amicable numbers";
        last;
    }
    $i++
}

Note that since we don't know in advance how large the first amicable numbers will be, we build an infinite loop and break out of it when we've found the first amicable numbers.

This program displays the following correct result:

$ perl amicable_nrs.pl
220 and 284 are amicable numbers

Amicable Numbers in Perl 6

We'll also use a sum_divisors subroutine doing something similar to the one in the P5 solution (but doing it in a somewhat simpler way). And loop over a lazy infinite list of integers with essentially the same algorithm as the P5 implementation.

use v6;

sub sum-divisors (Int $num) {
    my @divisors = grep { $num %% $_ }, 2..($num / 2).Int;
    return [+] 1, | @divisors;
}

for 2..Inf -> $i {
    my $sum_div = sum-divisors $i;
    if $sum_div > $i and $i == sum-divisors $sum_div {
        say "$i and $sum_div are amicable numbers";
        last;
    }
}

This program prints the same thing as the P5 program:

$ perl6 amicable_nrs.p6
220 and 284 are amicable numbers

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, August 18. And, please, also spread the word about the Perl Weekly Challenge if you can.

Perl Weekly Challenge # 19: Weekends and Wrapping Lines

These are some answers to the Week 19 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in several days from now (August 4, 2019). This blog post offers some solutions to this challenge, please don't read on if you intend to complete the challenge on your own.

Challenge # 1: Five Weekends in a Month

Write a script to display months from the year 1900 to 2019 where you find 5 weekends i.e. 5 Friday, 5 Saturday and 5 Sunday.

This time, I'll start with Perl 6, because the built-in Date type seems to make it easier.

Five Weekends in Perl 6

My first idea was to loop over each month in the range 1900-2019. For each month, find the first Friday and then count the number of Sundays after that in the month. Then, even before I started to code that, it came to my mind that I didn't need to loop over all the days of the month, but just to count how many days there were in the month after the first Friday: there will five weekends if there are more than 29 days (4 weeks plus 1 day) after that first Friday of the month.

use v6;

for 1900..2019 -> $year {
    for 1..12 -> $month {
        my $day = 1;
        my $date = Date.new($year, $month, $day);
        my $last-day-of-month = 
            $date.later(month => 1).earlier(day => 1);
        ++$date until $date.day-of-week == 5;
        say $year, "-", $month.fmt("%02d"), " has 5 weekends" 
            if $last-day-of-month - $date > 29;
    }
}

That works fine:

1901-03 has 5 weekends
1902-08 has 5 weekends
1903-05 has 5 weekends
1904-01 has 5 weekends
...
(lines omitted for brevity)
...
2016-01 has 5 weekends
2016-07 has 5 weekends
2017-12 has 5 weekends
2019-03 has 5 weekends

Then I started to check the result and looked at a calendar, and it became obvious to me that it is actually even much simpler than that: to have 5 full weekends (Friday through Sunday), a month needs to have 31 days (so January, March, May, etc.) and to start with a Friday. So this is my new simpler script:

use v6;

for 1900..2019 -> $year {
    for 1, 3, 5, 7, 8, 10, 12 -> $month {
        say "$year-{$month.fmt("%02d")} has 5 weekends." 
            if Date.new($year, $month, 1).day-of-week == 5;
    }
}

This prints the same as before, there no point repeating the output.

Five Weekends in Perl 5

I originally thought that solving the challenge in Perl 5 would require either complex computations or the use of sophisticated modules such as Date::Calc or Time::Piece. Given the simplification found in the course of solving the problem in Perl 6, we only need the Time::Local core module, which provides functions that are the inverse of built-in localtime() and gmtime() functions.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;
use Time::Local;

for my $year (0..119) {
    for my $month ( 1, 3, 5, 7, 8, 10, 12) {
        my $date = timegm (0, 0, 0, 1, $month - 1, $year);
        say $year + 1900, "-", sprintf("%02d ", $month)
            if (gmtime $date)[6] == 5;
    }
}

This program produces the same output as the P6 solution above.

Wrapping Lines

Write a script that can wrap the given paragraph at a specified column using the greedy algorithm.

For this, we will suppose our wrap subroutine receives a line of text as a parameter and return a string wrapped to fit a certain maximal width. We will also suppose that the input text is plain ASCII, some changes may have to be done for Unicode text.

Wrapping Lines in Perl 6

The most typical way to solve such a problem is usually to split the input into words and to add words to a line until it goes over the maximal width; at this point, the script removes the last word, produces the line and starts the new iteration with the removed word.

I tend to think it will be more efficient to look for a space backward (with the rindex built-in function) from the maximal length position, because this will be doing less string manipulations:

use v6;

sub wrap (Str $line is copy, Int $width) {
    my $out = '';
    while ($line) {
        return $out ~ "$line\n" if $line.chars < $width;
        my $pos = rindex $line, ' ', $width - 1;
        $out = $out ~ substr($line, 0, $pos) ~ "\n";
        $line = substr $line, $pos+1;
    }
    return $out;
}

my $in = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Dolor sed viverra ipsum nunc aliquet bibendum enim. In massa tempor nec feugiat. Nunc aliquet bibendum enim facilisis gravida. Nisl nunc mi ipsum faucibus vitae aliquet nec ullamcorper. Amet luctus venenatis lectus magna fringilla.";

say wrap $in, $_ for 60, 35;

This produces the following output:

$ perl6 wrap.p6
Lorem ipsum dolor sit amet, consectetur adipiscing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna
aliqua. Dolor sed viverra ipsum nunc aliquet bibendum enim.
In massa tempor nec feugiat. Nunc aliquet bibendum enim
facilisis gravida. Nisl nunc mi ipsum faucibus vitae
aliquet nec ullamcorper. Amet luctus venenatis lectus magna
fringilla.

Lorem ipsum dolor sit amet,
consectetur adipiscing elit, sed
do eiusmod tempor incididunt ut
labore et dolore magna aliqua.
Dolor sed viverra ipsum nunc
aliquet bibendum enim. In massa
tempor nec feugiat. Nunc aliquet
bibendum enim facilisis gravida.
Nisl nunc mi ipsum faucibus vitae
aliquet nec ullamcorper. Amet
luctus venenatis lectus magna
fringilla.

Wrapping Lines in Perl 5

Since Perl 5 also has the rindex built-in function, we can just do the same in Perl 5:

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

sub wrap {
    my ($line, $width) = @_;
    my $out = '';
    while ($line) {
        return $out . "$line\n" if length $line < $width;
        my $pos = rindex $line, ' ', $width - 1;
        $out = $out . substr($line, 0, $pos) . "\n";
        $line = substr $line, $pos+1;
    }
    return $out;
}

my $in = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Dolor sed viverra ipsum nunc aliquet bibendum enim. In massa tempor nec feugiat. Nunc aliquet bibendum enim facilisis gravida. Nisl nunc mi ipsum faucibus vitae aliquet nec ullamcorper. Amet luctus venenatis lectus magna fringilla.";
say wrap $in, $_ for 60, 35;

This produces the same output as the Perl 6 implementation, so I won't repeat it here.

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, August 11. And, please, also spread the word about the Perl Weekly Challenge if you can.

And see you hopefully at the European Perl Conference in Riga, Aug. 7 to 9, 2019.

Perl Weekly Challenge # 18: Priority Queues and Binary Heaps In Perl 6

In this previous blog post, I provided some answers to the Week 18 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

However, I omitted the Perl 6 solution to the priority queues part of the challenge, because I wanted to use a binary heap to solve it, and this required too many explanations: the blog post was just getting too long. So this post will complete the previous post and look into binary heaps in Perl 6.

Spoiler Alert: This weekly challenge deadline is due in several days from now (July 28, 2019). This blog post offers some solutions to this challenge, please don't read on if you intend to complete the challenge on your own.

This part of the challenge goes like this:

Write a script to implement Priority Queue. It is like regular queue except each element has a priority associated with it. In a priority queue, an element with high priority is served before an element with low priority. Please check this wiki page for more information. It should serve the following operations:

  • 1) is_empty: check whether the queue has no elements.

  • 2) insertwithpriority: add an element to the queue with an associated priority.

  • 3) pullhighestpriority_element: remove the element from the queue that has the highest priority, and return it. If two elements have the same priority, then return element added first.

As already explained in my previous post, there are numerous ways to design simple priority queues (at least when performance is not an issue, for instance if the data set isn't very large). For example, it might be sufficient to maintain an array of arrays (AoA), where each of the arrays is a pair containing the item and associated priority. Or an array of hashes (AoH) based on the same idea. This means that each time we want to pull the highest priority element, we need to traverse the whole data structure to find the item with the highest priority. This may be quite slow when there are many items, but this may not matter if our data structure only has a few dozen items.

Another way is to build a hash of arrays (HoA), where the hash keys are the priorities and the hash values are references to arrays. When the number of priorities is relatively small (compared to the number of items in the queues), this tends to be more efficient, but note that we still need to traverse the keys of the hash until we find the highest priority. An AoA with the index being the priority and the sub-hashes the item might be more efficient (because the priorities remain sorted), but this requires the priorities to be relatively small positive integers. We still have to traverse the top data structure until we find the first non-empty sub-array. We used this data structure for our Perl 5 implementation and can translate our P5 code into Perl 6 as follows:

use v6;

sub new-queue {
    my @queue;  # an AoA
    sub is_empty {
        for @queue -> $item {
            next unless defined $item;
            return False if $item.elems > 0;
        }
        True;
    }
    sub insert_with_prio ($item, Int $prio) {
        push @queue[$prio], $item;
    }
    sub pull_highest_prio {
        for reverse @queue -> $item {
            next unless defined $item;
            return shift $item if $item.elems > 0;
        }
    }
    return &is_empty, &insert_with_prio, &pull_highest_prio;
}

my (&is-empty, &insert, &pull-prio) = new-queue;
# Testing the above code
for 1..20 -> $num {
    insert($num,
        $num %% 10 ?? 10 !!
        $num %% 5  ?? 5  !!
        $num %% 3  ?? 3  !!
        $num %% 2  ?? 2  !! 
        1);
}        
for 1..20 -> $num {
    say pull-prio;
}
say "Empty queue" if is-empty();

As in the P5 version, we're using functional programming to implement a pseudo-object system, in which the new-queue subroutine is acting as an object constructor, although it is implemented as a function factory. The @queue object is limited to the scope of the new-queue constructor, but the three methods returned to the caller are closures that can access to the content of their shared object, @queue. Please refer to the other blog post mentioned above for further details on this implementation and on the test code at the end. This script displays more or less the same as the P5 implementation:

$ perl6  queues.p6
10
20
5
15
3
6
9
12
18
2
4
8
14
16
1
7
11
13
17
19
Empty queue

Another possibility is to use a heap, a data structure that usually has better performance (when it matters). This what we will look into now.

Background on Binary Heaps

A binary heap is a binary tree that keeps a partial order: each node has a value less than its parent and larger than either of its two children; there is no specific order imposed between siblings. (You may also do it the other way around: you can design heaps in which any node has a value larger than its parent, you basically only need to reverse the comparison.)

Because there is no order between siblings, it is not possible to find a particular element without potentially searching the whole heap. Therefore, a heap is not very good if you need random access to specific nodes. But if you're interested in always finding the largest (or smallest) item, then a heap is a very efficient data structure.

Heaps are used for solving a number of computer science problems, and also serve as the basis for an efficient and very popular sorting technique called heap sort.

For a human, it is useful to represent a heap in a tree-like form. But a computer can store a heap as a simple array (not even a nested array). For this, the index of an element is used to compute the index of its parent or its two children. Roughly speaking, the children of an element are the two locations where the indices are about double its index; conversely, the parent of a node is located at about half its index. If the heap starts at index 0, the exact formulas for a node with index $n are commonly as follows:

  • Parent: int( ($n-1)/2 )
  • Left child: 2*$n + 1
  • Right child: 2*$n + 2

The root node is at index 0. Its children are at positions 1 and 2. The children of 1 are 3 and 4 and the children of 2 are 5 and 6. The children of 3 are 7 and 8, and so on.

Suppose we build a heap (in ascending order) from an array of all letters between a and v provided in any pseudo-random order, for example:

my @input =  <m t f l s j p o b h v k n q g r i a d u e c>;

The resulting @heap might be something like this:

[a b g d c k j l f h e m n q p t r o i u s v]

We will see below on how to build such a heap from an unordered array, but let's concentrate for now on the heap properties.

The order in the @heap above may not be immediately obvious, but a is the smallest letter, and its two children, b and g, are larger than a. The children of b are d and c and are larger than their parent b. Similarly, the children of g are k and j and are larger than their parent. And so on. But it is rather inconvenient to manually check that we have a valid heap. So, we may want to write a helper subroutine to display the heap in a slightly more graphical way:

sub print-heap (@heap) {
    my $start = 0;
    my $end = 0;
    my $last = @heap.end;
    my $step = 1;
    loop {
        say @heap[$start..$end];
        last if $end == $last;
        $start += $step;
        $step *= 2;
        $end += $step;
        $end = $last if $end > $last;
    } 
}

This subroutine will not be used in the final code, but it proved to be very useful for debugging purposes.

If we pass the letter heap as an argument to this subroutine, it will be displayed in the following format:

(a)
(b g)
(d c k j)
(l f h e m n q p)
(t r o i u s v)

With a little bit of reformatting we can now see its structure in a tree-like format:

                (a)
        (b              g)
    (d       c      k        j)
  (l  f    h  e   m  n     q   p)
(t r o i  u s v ...)

And from that, we can now easily draw the tree:

heap.png

The important thing to notice is that there is no particular order between siblings, but children are always larger than their parent.

How to build a Binary Heaps

Since we'll be dealing later with integers (priorities) in descending order, we will abandon our ascending order letter heap. Let's suppose we have this heap example taken from the implementation section of the Wikipedia page on heaps):

my @heap = 100, 19, 36, 17, 12, 25, 5, 9, 15, 6, 11;

For the time being, we will consider it is a global variable accessible anywhere in the file.

Our print-heap helper subroutine would display it as:

(100)
(19 36)
(17 12 25 5)
(9 15 6 11)

We can see it's a valid heap (the children are always smaller than their parent).

Let's now add a new item, say 45, at the end of this array (for example with the push function). Of course, this item is not at its right place and the array is no longer a valid heap, but we can now use the following subroutine to move items around in order to obtain again a valid heap:

sub add-in-heap ($index is rw) {
    my $index-val = @heap[$index];
    while ($index) {
        my $parent-idx = Int( ($index - 1) /2);
        my $parent-val = @heap[$parent-idx];
        last if $parent-val >    $index-val;
        @heap[$index] = $parent-val;
        $index = $parent-idx;
    }
    @heap[$index] = $index-val;
}

The parameter passed is the index of the item that has just been added at the end of the array (11 in this example). This subroutine looks at the value of the parent of this new item. If the parent is larger than the new item, then we're done: the new array happens to be a valid heap (which is not the case in our example). If not, then we move the parent value to the position where we've just added the new element. Then we change the index of interest to the parent and iterate this way until either the elements are in the right place (the parent value is larger than the current index value) or the index become 0 (we've reached the root node). At this point, the loop ends and we can put the value we've added in the right place. If you think about it in terms of the binary tree shown above, we're really exploring the single path from the added element to the root (although we may not have to go all the way up to the root), the rest of the heap remains untouched.

Note that this subroutine is not designed to do anything special when fed with duplicate values. Here, duplicates will he handled gracefully and returned in the correct order. So, that's OK, it works fine, but we'll have to do something special about it when we will implement priority queues (if we had two priorities with the same value in the heap, we would be unable to predict the order in which items having the same priority will be pulled).

This subroutine will move around items from parent to child from the end to the beginning of the array (or at least until the new added value finds its right place), so that we get a new valid heap:

[100 19 45 17 12 36 5 9 15 6 11 25]

Using the helper subroutine to display the new heap outputs this:

(100)
(19 45)
(17 12 36 5)
(9 15 6 11 25)

I'll leave it to you to draw the tree to check that it is a proper heap.

We now know how to add an item to a existing heap, we can of course use that subroutine to add an item to an empty heap, and we can use that subroutine repeatedly to place each item in its proper place in order to create a heap from an input list in any order:

for @array.keys -> $i {
    my $idx = $i;
    add-in-heap $idx;
}

At the end of this loop, the @array will have been turned into a heap.

Removing One Element from the Heap

If we're looking for the largest element, it will be the root of the tree, i.e. the first item of the array.

Now, if we want to use this data structure to manage a priority queue, we will need at some point to delete the value in the root node and to reorganize the array so that it becomes again a legitimate heap. When we remove (100) from the above array, we have to choose the largest item between the two children, i.e. 45 in our example, and promote it as a new root node. And we can then propagate similarly the needed changes until the end of the array.

But the thermometer outside my house now shows 43.6 °C (in the shade), and it is more than 37°C inside. So, I'll be a bit lazy for a moment and, rather than writing such a new subroutine (which should be done if you want to be efficient and will be done below), I'll consider the array with the root node removed as an array in no particular order and use the code already written (the add-in-heap subroutine) to build a new heap from it:

sub take-from-heap {
    my $result = shift @heap;
    for @heap.keys -> $i {
        my $idx = $i;
        add-in-heap $idx;
    }
    return $result;
}

If we run that subroutine on our existing heap, it will return the largest item (the root node, i.e. 100) to the caller and reorganize the rest of the array into a new heap:

[45 36 17 15 25 5 9 12 6 11 19]

(45)
(36 17)
(15 25 5 9)
(12 6 11 19)

OK, this works, but reconstructing the full heap each time we remove an item is somewhat inefficient, which goes against the very purpose of heaps. What should a proper take-from-heap subroutine do? Take a look again at the binary tree displayed above. If we take off the root node value (a), we should replace it by b which is larger than g. It should be clear that we won't need to change anything in the g sub-tree. And we can recursively replace b by c, and then c by e and finally e by v. Nothing else needs to be changed. So basically we have to move up one step each of the nodes on the path of the smallest nodes in the b sub-tree. And, by the way, it is thanks to the fact that, whether we add a new item or remove an item from the heap, we only need to traverse one single path through the heap that insertion and deletion operations have a 0(log n) complexity and are therefore fairly fast. Implementing the ideas just described is not too difficult, but, for each visited node, we need to take into account three possible cases: this node may have 0, 1 or 2 children.

sub take-from-heap {
    my $result = @heap[0];
    my $index = 0;
    loop {
        my $left-index = 2 * $index + 1;  
            # right-index is $left-index + 1
        unless (defined @heap[$left-index] or 
            defined @heap[$left-index + 1]) {
                @heap.splice($index, 1);
                last;
        }
        unless defined @heap[$left-index + 1] {
            @heap[$index] = @heap[$left-index]:delete;
            last;
        }
        unless defined @heap[$left-index] { # probably not happening
            @heap[$index] = @heap[$left-index + 1]:delete;
            last;
        }            
        # both children are defined if we get here
        my $next-index = ($left-index, 
            $left-index + 1).max({@heap[$_]});
        @heap[$index] = @heap[$next-index];
        $index = $next-index;
    }
    return $result;
}

If we run this new subroutine on our previous heap, we obtain this new heap:

[45 19 36 17 12 25 5 9 15 6 11]
(45)
(19 36)
(17 12 25 5)
(9 15 6 11)

Note that this is not the same heap as the one obtained before (same data but not in the same order), but this is another valid heap for such data. Using this subroutine repeatedly, we'll get the nodes in the same order: 45, 36, 25, 19, 17 etc. For example, let's run the new take-from-heap 10 times on our original heap and print out each time the removed first item and the resulting heap:

say "First item = ", take-from-heap, "; Heap: ", @heap for 1..10;

We can see that we have a valid heap at each iteration and pull the values in the right order:

First item = 100; Heap: [45 19 36 17 12 25 5 9 15 6 11]
First item = 45; Heap: [36 19 25 17 12 5 9 15 6 11]
First item = 36; Heap: [25 19 9 17 12 5 15 6 11]
First item = 25; Heap: [19 17 9 11 12 5 15 6]
First item = 19; Heap: [17 12 9 11 5 15 6]
First item = 17; Heap: [12 11 9 5 15 6]
First item = 12; Heap: [11 15 9 5 6]
First item = 11; Heap: [15 6 9 5]
First item = 15; Heap: [9 6 5]
First item = 9; Heap: [6 5]

So, it seems that we have a working algorithm to manage heaps. Let's turn now to priority queues.

A Priority Queue as a Heap

Basically, we want to manage our priorities with a heap, and each priority will be associated with an array containing the individual items in the order in which they were inserted. To give you immediately an idea of the data structure, the queue will look like this at a certain point during the execution of the tests in the script below:

[[10 [10 20]] [5 [5 15]] [2 [2 4 8 14 16]] [1 [1 7 11 13 17 19]] [3 [3 6 9 12 18]]]

The first item in the queue displayed above, [10 [10 20]], is the data structure for priority 10, which contains two elements, 10 and 20. The next one is for priority 5. And so on.

When we are inserting elements (item and priority), we first call insert_with_prio to check whether there is already an array for the given priority. If it already exists, we just add the item to the array of elements associated with this priority. If there no array with such priority, then we call add-to-queue to add a priority data structure into the heap (and reorganize the heap as we've done before). Similarly, when we call pull_highest_prio, we just pick up and return the first element from the data array of the first priority item. In the event that the data array of a given priority becomes empty, then we call take-from-heap to remove the priority data structure from the heap (and reorganize the heap as we've done before).

use v6;
sub new-queue {
    my @queue;  # an AoA
    sub is_empty {
        @queue.elems == 0;
    }
    sub insert_with_prio ($item, Int $prio) {
        my $index = first {@queue[$_][0] == $prio}, @queue.keys;
        if (defined $index) {
            push @queue[$index][1], $item;
        } else {
            push @queue, [$prio, [$item]];
            my $idx = @queue.end;
            add-to-queue($idx);
        }
    }        
    sub pull_highest_prio {
        return Nil if is-empty;
        my $result = shift @queue[0][1];
        take-from-heap if @queue[0][1].elems == 0;
        return $result;
    }
    sub add-to-queue ($index is rw) {
        my $index-val = @queue[$index];
        while ($index) {
            my $parent-idx = Int( ($index - 1) /2);
            my $parent-val = @queue[$parent-idx];
            last if $parent-val[0] > $index-val[0];
            @queue[$index] = $parent-val;
            $index = $parent-idx;
        }
        @queue[$index] = $index-val;
    }
    sub take-from-heap {
        my $index = 0;
        loop {
            my $left-index = 2 * $index + 1; 
                # right-index is $left-index + 1
            unless (defined @queue[$left-index] or 
                defined @queue[$left-index + 1]) {
                    @queue.splice($index, 1);
                    last;
            }
            unless defined @queue[$left-index + 1] {
                @queue[$index] = @queue[$left-index]:delete;
                last;
            }
            unless defined @queue[$left-index] {
                @queue[$index] = @queue[$left-index + 1]:delete;
                last;
            }            
            # both children are defined if we get here
            my $next-index = ($left-index, 
                $left-index + 1).max({@queue[$_][0]});
            @queue[$index] = @queue[$next-index];
            $index = $next-index;
        }
    }

    return &is_empty, &insert_with_prio, &pull_highest_prio;
}
my (&is-empty, &insert, &pull-prio) = new-queue;
# Testing the above code: 20 insertions and then trying 30 deletions
for 1..20 -> $num {
    insert($num,
        $num %% 10 ?? 10 !!
        $num %% 5  ?? 5  !!
        $num %% 3  ?? 3  !!
        $num %% 2  ?? 2  !! 
        1);
} 
for 1..30 -> $num {
    last if is-empty;
    say pull-prio;
}
say "Empty queue" if is-empty();

This program displays more or less the same as before:

$ perl6  heap_queue.p6
10
20
5
15
3
6
9
12
18
2
4
8
14
16
1
7
11
13
17
19
Empty queue

Adding some additional print statements shows how the priority queue is evolving when we pull elements from it:

[[10 [10 20]] [5 [5 15]] [2 [2 4 8 14 16]] [1 [1 7 11 13 17 19]] [3 [3 6 9 12 18]]]

[ ... lines omitted for brevity ...]

Pulled 18; New queue: [[2 [2 4 8 14 16]] [1 [1 7 11 13 17 19]]]
Pulled  2; New queue: [[2 [4 8 14 16]] [1 [1 7 11 13 17 19]]]
Pulled  4; New queue: [[2 [8 14 16]] [1 [1 7 11 13 17 19]]]
Pulled  8; New queue: [[2 [14 16]] [1 [1 7 11 13 17 19]]]
Pulled 14; New queue: [[2 [16]] [1 [1 7 11 13 17 19]]]
Pulled 16; New queue: [[1 [1 7 11 13 17 19]]]
Pulled  1; New queue: [[1 [7 11 13 17 19]]]
Pulled  7; New queue: [[1 [11 13 17 19]]]
Pulled 11; New queue: [[1 [13 17 19]]]
Pulled 13; New queue: [[1 [17 19]]]
Pulled 17; New queue: [[1 [19]]]
Pulled 19; New queue: []
Empty queue

The code is quite long and is certainly not worth the effort if we're going to manage only 20 data elements and 5 priorities, as in our test cases above. But with much larger datasets and wider ranges of priority, it should be more efficient than our other implementations. If we're going to use many priority queues, the code of the add-to-queue and take-from-heap subroutines could be stored separately in a module, making the new-queue code much smaller and more manageable.

Note that the insert_with_prio subroutine is traversing sequentially the heap to figure out wether the priority data struture already exists in the heap. Depending on the number of priorities, this might become time consuming. It would be easy to add and maintain a hash keeping track of the existing priorities and their position in the heap, to avoid sequential search. I did not do it because I considered this to be an implementation detail that may be or may not be useful depending on the exact circumstances. I would probably do it if I were to write a heap priority library for a CPAN module.

A Heap Priority Queue in Perl 5

Asides from the small syntax differences between the two languages, there is nothing in what we did above in Perl 6 that cannot be done almost identically in Perl 5. For the benefit of readers who might not know yet the Perl 6 syntax or may otherwise find it easier to test and play in Perl 5, we provide here a translation of the heap priority queue in Perl 5:

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;
use Data::Dumper;

sub new_queue {
    my @queue;  # an AoA
    my $add_to_queue = sub  {
        my $index = shift;
        my $index_val = $queue[$index];
        while ($index) {
            my $parent_idx = int( ($index - 1) /2);
            my $parent_val = $queue[$parent_idx];
            last if $parent_val->[0] > $index_val->[0];
            $queue[$index] = $parent_val;
            $index = $parent_idx;
        }
        @queue[$index] = $index_val;
    };
    my $take_from_heap = sub {
        my $index = 0;
        while (1) {
            my $left_index = 2 * $index + 1; 
                # right_index is $left_index + 1
            unless (defined $queue[$left_index] or 
                defined $queue[$left_index + 1]) {
                    splice @queue, $index, 1;
                    last;
            }
            unless (defined $queue[$left_index + 1]) {
                $queue[$index] = delete $queue[$left_index];
                last;
            }
            unless (defined $queue[$left_index]) {
                $queue[$index] = delete $queue[$left_index + 1];
                last;
            }            
            # both children are defined if we get here
            my $next_index = $queue[$left_index][0] 
                > $queue[$left_index + 1][0] ?
                    $left_index : $left_index + 1;
            $queue[$index] = $queue[$next_index];
            $index = $next_index;
        }
    };
    my $is_empty = sub  {
        scalar @queue == 0;
    };
    my $insert_with_prio = sub  {
        my ($item, $prio) = @_;
        my $index;
        for my $priority (0..$#queue) {
            if($queue[$priority][0] == $prio) {
                $index = $priority;
                last;
            }
        }
        if (defined $index) {
            push @{$queue[$index][1]}, $item;
        } else {
            push @queue, [$prio, [$item]];
            my $idx = $#queue;
            $add_to_queue->($idx);
        }
    };

    my $pull_highest_prio = sub  {
        return undef if $is_empty->();
        my $result = shift @{$queue[0][1]};
        $take_from_heap->() if scalar @{$queue[0][1]} == 0;
        return $result;
    };

    return $is_empty, $insert_with_prio, $pull_highest_prio;
}
my ($is_empty, $insert, $pull_prio) = new_queue;
# Testing the above code: 20 insertions and deletions
for my $num (1..20) {
    $insert->($num,
        $num % 10 == 0 ? 10 :
        $num % 5  == 0 ? 5  :
        $num % 3  == 0 ? 3  :
        $num % 2  == 0 ? 2  : 
        1);
} 
for my $num (1..30) {
    last if $is_empty->();
    say $pull_prio->();
}
say "Empty queue" if $is_empty->();

I won't repeat the output displayed on the screen when running this program, as it is exactly the same as what we had with the Perl 6 implementation. And I won't comment further on it: almost everything I said before about the P6 implementation holds with this P5 version, so please refer to the above explanations on the P6 version if you need.

Note however the John Macdonald (one of the authors of the book Mastering Algorithms with Perl published by O'Reilly) wrote a collection of heap modules available on the CPAN. You might want to try some of them, especially his implementation of binary heaps.

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, August 4. And, please, also spread the word about the Perl Weekly Challenge if you can.

And see you hopefully at the European Perl Conference in Riga, Aug. 7 to 9, 2019.

Perl Weekly Challenge # 18: Longest Common Substrings, Priority Queues, and a Functional Object System

These are some answers to the Week 18 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in several days from now (July 28, 2019). This blog post offers some solutions to this challenge, please don't read on if you intend to complete the challenge on your own.

Challenge # 1: Longest Common Substrings

Write a script that takes 2 or more strings as command line parameters and print the longest common substring. For example, the longest common substring of the strings “ABABC”, “BABCA” and “ABCBA” is string “ABC” of length 3. Other common substrings are “A”, “AB”, “B”, “BA”, “BC” and “C”. Please check this wiki page for details.

I can see at least two ways to tackle the problem (to simplify, let's say between two strings). One is to have two nested loops, one on the letters of the first string and the second one on the letters of the second string, and to store the substrings (or, possibly, the longest so far). The other is to generate all the substrings of each word and then to compare them. I'll use the first approach in Perl 5 and the second one in Perl 6 (because P6 has some functionalities making the second approach easy and interesting, and probably quite efficient).

Note that the programs below will consider only extended ASCII strings for simplicity. A couple of very minor changes might be needed for dealing properly with full Unicode strings.

Longest Common Substring in Perl 5

We loop over the input strings and, for each pair of input strings, use the compare2str subroutine to find the common substrings. The %common hash is used to store the common substrings (we use a hash because it automatically removes duplicate substrings (which we don't need) and it is easier and faster to filter the result in order to keep only substrings that are common to all input strings.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

sub compare2str {
    my ($str1, $str2) = @_;
    my @st1 = split //, $str1;
    my @st2 = split //, $str2;
    my %result;
    my $common = '';
    my ($i, $j) = (0, 0);
    while ($i <= $#st1) {
        while ($j <= $#st2) {
            if ($st1[$i] eq $st2[$j]) {
                $common .= $st1[$i];
                $result{$common} = 1;
                my ($k, $l) = ($i, $j);
                while (1) {
                    $k++; $l++; 
                    if ($k <= $#st1 and $l<= $#st2 
                        and $st1[$k] eq $st2[$l]) {
                            $common .= $st1[$k];
                            $result{$common} = 1;;
                    } else {
                        $common = '';
                        last;
                    }
                }             
            }
            $j++;
        }
        $j = 0;
        $i++;
    }
    return keys %result;
}

die "Must supply at least two strings\n" unless @ARGV >= 2;
my %common = map { $_ => 1 } compare2str shift, $ARGV[0];
while (@ARGV > 1) {
    %common = map { $_ => 1 } grep $common{$_}, 
        compare2str shift, $ARGV[0];
}   
my $max = "";
for (keys %common) {
    $max = $_ if length $_ > length $max;
}
say "Largest common substring: $max";

This works as expected. For example:

$ perl  substrings.pl ABABCTO BABCTO ABCTBA
Largest common substring: ABCT

However, it turns out the code is quite complicated for a relatively easy task.

Of course, we could use a CPAN module such as String::LCSS, but that would be cheating: we want to implement the algorithm ourselves. It is certainly quite possible to improve the code above here and there in order to make it more concise and possibly clearer, but I will not do that because I want to test a different approach. I'll come back to that after we've seen our Perl 6 solution.

Longest Common Substring in Perl 6

As mentioned earlier, our approach for the Perl 6 version is to generate all the unique substrings of each input string and then only keep the substrings that are common to all strings. It is then easy to find the longest substring.

To generate all the substrings of a given string, we could probably use the regex engine with the :exhaustive adverb, to get all the overlapping matches, but we'll rather use the rotor built-in subroutine, which isn't mentioned very often although it is very powerful and expressive.

These are two examples using rotor under the REPL:

> 'abcd'.comb.rotor(1);
((a) (b) (c) (d))
> 'abcd'.comb.rotor(2);
((a b) (c d))

In these examples, rotor groups the elements of the invocant into groups of 1 and 2 elements. We're a long way from generating all the substrings of a given string. But we can do better:

> say 'abcd'.comb.rotor($_) for 1..4;;
((a) (b) (c) (d))
((a b) (c d))
((a b c))
((a b c d))

This is already much better, but we're still missing some of the desired substrings such as bc and bcd.

The rotor method can take as parameter a key-value pair, whose value (the second item) specifies a gap between the various matches:

> (1..10).rotor(2 => 1)
((1 2) (4 5) (7 8))

As you can see, we obtain pairs of values, with a gap of 1 between the pairs (item 3, 6 and 9 are omitted from the list. Now, the gap can also be negative and, in that case, we get all successive pairs from the range:

> (1..10).rotor(2 => -1)
((1 2) (2 3) (3 4) (4 5) (5 6) (6 7) (7 8) (8 9) (9 10))

The rotor subroutine can in fact do much more than that (check the rotor documentation), but I've basically shown the features that we'll use here.

The other Perl 6 functionality that we will use here is the the Set type and the associated intersection ( or (&)) operator. This operator does exactly what set intersection do in the mathematical set theory: it returns the elements that are common to the two sets.

We can now code the largest common substring in Perl 6:

use v6;
use Test;

sub substrings (Str $in) {
    my @result = $in.comb;
    append @result,  map { .join('') }, $in.comb.rotor: $_ => 1-$_ for 2..$in.chars;
    return set @result;
}
sub largest-substring (@words) {
    my Set $intersection = substrings shift @words;
    while (my $word = shift @words) {
        $intersection ∩= substrings $word;
    }
    return $intersection.keys.max({.chars});
}
multi MAIN (*@words where *.elems > 1) {
    say largest-substring @words;
}
multi MAIN () {
    plan 2;
    my @words = <ABABC BABCA ABCBA>;
    cmp-ok largest-substring(@words), 'eq', 'ABC', "Testing 3 strings";
    @words = 'abcde' xx 5;
    cmp-ok largest-substring(@words), 'eq', 'abcde', "Testing identical strings";
}

Running the program with no argument to run the tests produces this:

$ perl6  substrings.p6
1..2
ok 1 - Testing 3 strings
ok 2 - Testing identical strings

And with the same parameters as our P5 implementation, we get the same result:

perl6  substrings.p6 ABABCTO BABCTO ABCTBA
ABCT

Longest Common Substring in Perl 5 - Reloaded

The P6 script doing the bulk of the actual work is only 12 lines of code, whereas the equivalent in our initial Perl 5 implementation is 31 lines. Even though the P6 version is using features that don't exist in P5, perhaps we can mimic the same approach:

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

sub substrings {
    my @chars = split //, shift;
    my %substr; # using a hash to remove duplicates
    for my $i (0..$#chars) {
        for my $j ($i..$#chars) {
            $substr{ join '', @chars[$i..$j] } = 1;
        }
    }
    return keys %substr;
}
my %result = map { $_ => 1} substrings shift;
for my $word (@ARGV) {
    %result = map {$_ => 1} grep $result{$_}, substrings $word;
}
my $max = 0;
for (keys %result) {
    $max = $_ if length $_ > length $max;
}
say "Largest common substring: $max";

This prints out:

$ perl  substrings.pl ABABCTO BABCTO ABCTBA
Largest common substring: ABCT

Now, the bulk of the actual work is done in 14 lines of code, less than half the original P5 implementation, and barely more that the P6 implementation. It is quite interesting: the features of P6 gave some ideas on how to solve the problem better in Perl 5.

Priority Queues

Write a script to implement Priority Queue. It is like regular queue except each element has a priority associated with it. In a priority queue, an element with high priority is served before an element with low priority. Please check this wiki page for more informations. It should serve the following operations:

  • 1) is_empty: check whether the queue has no elements.

  • 2) insertwithpriority: add an element to the queue with an associated priority.

  • 3) pullhighestpriority_element: remove the element from the queue that has the highest priority, and return it. If two elements have the same priority, then return element added first.

There are numerous ways to design simple priority queues (at least when performance is not an issue, for instance is the data set isn't very large). For example, it might be sufficient to maintain an array of arrays (AoA), where each of the arrays is a pair containing the item and associated priority. Or an array of hashes (AoH) based on the same idea. This means that each time we want to pull the highest priority element, we need to traverse the whole data structure to find the item with the highest priority. This may be quite slow when there are many items, but this may not matter if our data structure only has a few dozen items.

Another way is to build a hash of arrays (HoA), where the hash keys are the priority and the hash values references to arrays. When the number of priorities is relatively small (compared to the number of items in the queues), this tends to be more efficient, but note that we still need to traverse the keys of the hash until we find the highest priority. An AoA with the index being the priority and the sub-hashes the item might be more efficient (because the priorities remain sorted), but this requires the priorities to be relatively small positive integers. We still have to traverse the top data structure until we find the first non-empty sub-array. We will use this data structure for our Perl 5 implementation.

Another possibility is to use a heap, a data structure that usually has better performance (when it matters). We'll come back to that later.

Priority Queues in Perl 5 (Objects in Functional Programming)

A typical way of implementing queues with their various attached methods would obviously be with object-oriented programming (OOP), using either the standard Perl 5 object system, or any of the more modern variants such as Moose, Moo, Mouse, Mo, or whatever.

But since I'm not a typical person (and since, as those who have read some of my previous posts know by now, I like functional programming), I'll do it in an atypical way and build my objects with functional programming. As you'll see, it's quite simple. Defining my own light-weight object systems and my simple "queue class" takes just one subroutine with only 21 lines of code:

sub new_queue { 
    my @queue;  # an AoA
    my $is_empty = sub {
        for my $item (@queue) {
            next unless defined $item;
            return 0 if @$item > 0;
        }
        return 1;
    };
    my $insert_with_prio = sub {
        my ($item, $prio) = @_;
        push @{$queue[$prio]}, $item;
    };
    my $pull_highest_prio = sub {
        for my $item (reverse @queue) {
            next unless defined $item;
            return shift @$item if @$item > 0;
        }
    };
    return $is_empty, $insert_with_prio, $pull_highest_prio;
}

The new_queue subroutine creates an empty @queue object and returns three coderefs ($is_empty, $insert_with_prio, and $pull_highest_prio). The new_queue subroutine can be considered as a function factory which manufactures three coderefs for the benefit of the caller. Each of the three coderefs is an anonymous function that closes over the @queue AoA (in other words, they are closures).

The code that will call the new_queue "constructor" will only receive in return the three coderefs (let's call them "methods"). It will have no access whatsoever to the @queue object other than using these methods, so that's pretty good encapsulation.

Let's now look at the calling code:

my ($is_empty, $insert, $pull_prio) = new_queue;
for my $num (1..20) {   # inserting 20 items into the queue
    $insert->($num,
        $num % 10 == 0 ? 10 :
        $num % 5  == 0 ? 5  :
        $num % 3  == 0 ? 3  :
        $num % 2  == 0 ? 2  : 
        1);
}        
for my $num (1..20) {
    say $pull_prio->();
}
say "Empty queue" if $is_empty->();

The first code line is a call to the object constructor and the code retrieves the three coderefs (the methods).

The rest of the code above is just for testing purpose. We will insert into our queue 20 numbers (between 1 and 20) and assign them priorities as follows:

  • Priority 10 for numbers evenly divisible by 10;

  • Priority 5 for numbers evenly divisible by 5 (but not 10 and 20);

  • Priority 3 for numbers evenly divisible by 3 (but not 15);

  • Priority 2 for other even numbers;

  • Priority 1 for other numbers.

This is the structure of the @queue after insertion of the 20 items with their priority:

0  undef
1  ARRAY(0x6003018a0)
   0  1
   1  7
   2  11
   3  13
   4  17
   5  19
2  ARRAY(0x600301978)
   0  2
   1  4
   2  8
   3  14
   4  16
3  ARRAY(0x6007fdf90)
   0  3
   1  6
   2  9
   3  12
   4  18
4  undef
5  ARRAY(0x600842a88)
   0  5
   1  15
6  undef
7  undef
8  undef
9  undef
10  ARRAY(0x600842bf0)
   0  10
   1  20

So, when running this code, we should get the items in the following order: 10, 20, 5, 15, numbers evenly divided by 3 (except 15) in ascending order, etc. And we should finally get a "Empty queue" message if we have exhausted the queue. Let's see:

$ perl  queues.pl
10
20
5
15
3
6
9
12
18
2
4
8
14
16
1
7
11
13
17
19
Empty queue

Looks good, doesn't it?

Priority Queues in Perl 6

For the Perl 6 implementation, we want use a binary heap to store the data. But this requires quite a bit of explanations, and this blog post is going to get too long.

So, I'll stop here for now and prepare another blog post for that part in the next couple of days. Stay tuned if you want to know about priorities queues using binary heaps.

Update: This is the new post on heaps and priority queues in Perl 6 (and also in Perl 5)

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, August 4. And, please, also spread the word about the Perl Weekly Challenge if you can.

And see you hopefully at the European Perl Conference in Riga, Aug. 7 to 9, 2019.