Perl Weekly Challenge # 13: Fridays and Mutually Recursive Subroutines

These are some answers to the Week 13 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in several days from now (June 23, 2019). This blog post offers some solutions to this challenge, please don't read on if you intend to complete the challenge on your own.

Challenge # 1: Thank God, It's Friday

Write a script to print the date of last Friday of every month of a given year. For example, if the given year is 2019 then it should print the following:

2019/01/25
2019/02/22
2019/03/29
2019/04/26
2019/05/31
2019/06/28
2019/07/26
2019/08/30
2019/09/27
2019/10/25
2019/11/29
2019/12/27

Fridays in Perl 5

For this challenge, I'll use the Time::Local core module to generate dates within the given year (dates between the 21st and the 31st of each month) and the built-in gmtime function to find out if a given date occurs on a Friday. A small difficulty, however is that the timegm function of this module fails and exits if the date passed is invalid. So I used instead the timegm_nocheck which does not fail on, for example, Feb. 30, but returns in that case the epoch for Feb. 1. So, I only needed to check whether that function returned a different day in the month to stop the loop (the last statement in the code below) on non existing dates.

#!/usr/bin/perl
use strict;
use warnings;
use feature "say";
use Time::Local qw/timegm_nocheck/ ;


my $year = shift // 2019;
for my $month (0..11) {
    my @last_friday;
    for my $day (20..31) {
        my $epoch = 
            timegm_nocheck(0, 0, 12, $day, $month, $year - 1900);
        my @date_details = gmtime $epoch;
        last if $date_details[3] != $day;
        @last_friday = @date_details if $date_details[6] == 5;
    }
    printf "%d/%02d/%d\n", $year, $month + 1, $last_friday[3];
}

The script uses a parameter passed to it for the year and defaults to 2019 if no parameter is passed.

This outputs the following:

$ perl friday.pl
2019/01/25
2019/02/22
2019/03/29
2019/04/26
2019/05/31
2019/06/28
2019/07/26
2019/08/30
2019/09/27
2019/10/25
2019/11/29
2019/12/27

I admit that this program relies on a non-documented behavior, which is not the best and the cleanest. But I don't have much time this week to investigate other modules, as I'll attend a local Perl workshop in the second half of this week.

To avoid relying on non-documented behavior, it isn't difficult to determine the various month lengths, for example:

#!/usr/bin/perl
use strict;
use warnings;
use feature "say";
use Time::Local;

sub is_leap_year {
    my $year = shift;
    return $year % 4 == 0; # works for years between 1901 and 2099
}

my @month_lengths = qw/ 31 28 31 30 31 30 31 31 30 31 30 31/;
my $year = shift // 2019;

for my $month (0..11) {
    my $month_length = $month_lengths[$month];
    $month_length = 29 if $month == 1 
        and is_leap_year $year; # Feb is 1
    my @last_friday;
    for my $day (20..$month_length) {
        my $epoch = timegm(0, 0, 12, $day, $month, $year - 1900);
        my @date_details = gmtime $epoch;
        @last_friday = @date_details if $date_details[6] == 5;
    }
    printf "%d/%02d/%d\n", $year, $month + 1, $last_friday[3];
}

This works correctly, including for a year where the last Friday of February falls on Feb. 29:

$ perl friday.pl 2008
2008/01/25
2008/02/29
2008/03/28
2008/04/25
2008/05/30
2008/06/27
2008/07/25
2008/08/29
2008/09/26
2008/10/31
2008/11/28
2008/12/26

Note that the is_leap_year subroutine above is really computing Julian leap years, not Gregorian leap years commonly used nowadays, but that's fine for all years between 1901 to 2099. For years outside that range, we would need a full Gregorian leap year calculation, for example:

sub is_leap_year {
    my $year = shift;
    return 0 if $year % 4;   # not divisible by 4
    return 1 if $year % 100; # divisible by 4, not by 100
    return 0 if $year % 400; # divisible by 100, not by 400
    return 1;                # divisible by 400
}

Fridays in Perl 6

Perl 6 has powerful Date and DateTime data types which make date and time computations fairly easy. We could have nested loops for months ans days as in the P5 solution, but I decided to follow another path: we look at all the days of the year, filter out those that are not Fridays and print out the Fridays picked up just before any month change:

use v6;

sub MAIN (UInt $year = 2019) {
    my $year-length = 
        DateTime.new(:year($year)).is-leap-year ?? 366 !! 365;
    my $day = Date.new("$year-01-01");
    my $month = 1;
    my $last-friday;
    for 1..$year-length {
         $day = $day.succ;
         next unless $day.day-of-week == 5;
         if $day.month != $month {
            $month = $day.month;
            say $last-friday;
        }
        $last-friday = $day;
    }
    say $last-friday;
}

This prints out the same result as the P5 solution:

~ perl6 friday.pl6 2008
2008-01-25
2008-02-29
2008-03-28
2008-04-25
2008-05-30
2008-06-27
2008-07-25
2008-08-29
2008-09-26
2008-10-31
2008-11-28
2008-12-26

Challenge 2: Mutually Recursive Subroutines and Hofstadter Female and Male Sequences

Write a script to demonstrate Mutually Recursive methods. Two methods are mutually recursive if the first method calls the second and the second calls first in turn. Using the mutually recursive methods, generate Hofstadter Female and Male sequences.

 F ( 0 ) = 1   ;   M ( 0 ) = 0
 F ( n ) = n − M ( F ( n − 1 ) ) , n > 0
 M ( n ) = n − F ( M ( n − 1 ) ) , n > 0.

There is nothing complicated about mutually recursive subroutines. As with any recursive subroutine, you just need to make sure there is a base case to stop recursion (and that the base case will eventually be reached). The Wikipedia link provided in the question gives the beginning of the two sequences, which will help us checking our results:

F: 1, 1, 2, 2, 3, 3, 4, 5, 5, 6, 6, 7, 8, 8, 9, 9, 10, 11, 11, 12, 13, ... 
M: 0, 0, 1, 2, 2, 3, 4, 4, 5, 6, 6, 7, 7, 8, 9, 9, 10, 11, 11, 12, 12, ...

Hofstadter Female and Male Sequences in Perl 5

We just need to apply the mathematical definition:

#!/usr/bin/perl
use strict;
use warnings;
use feature "say";

sub female {
    my $n = shift;
    return 1 if $n == 0;   # base case
    return $n - male (female ($n - 1));
}
sub male {
    my $n = shift;
    return 0 if $n == 0;   #base case
    return $n - female (male ($n - 1));
}
say "Female sequence: ";
printf "%d ", female $_ for 0..30;
say "";
say "Male sequence:";
printf "%d ", male $_ for 0..30;
say "";

This displays the following output:

$ perl hofstadter.pl
Female sequence:
1 1 2 2 3 3 4 5 5 6 6 7 8 8 9 9 10 11 11 12 13 13 14 14 15 16 16 17 17 18 19
Male sequence:
0 0 1 2 2 3 4 4 5 6 6 7 7 8 9 9 10 11 11 12 12 13 14 14 15 16 16 17 17 18 19

Performance Problems for Large Input Values

The female and male subroutines will become very slow for (moderately) large input values. For example, changing the program to print only the female value of a number passed as an argument to the program (and removing the deep recursion warnings with a no warnings "recursion"; pragma), I obtain the following run times for input values 50, 100, 150 and 200:

$ time perl hofstadter.pl 50
Female sequence:
31

real    0m0,054s
user    0m0,015s
sys     0m0,030s

Laurent@LAPTOP-LHI8GLRC ~
$ time perl hofstadter.pl 100
Female sequence:
62

real    0m0,866s
user    0m0,828s
sys     0m0,031s

Laurent@LAPTOP-LHI8GLRC ~
$ time perl hofstadter.pl 150
Female sequence:
93

real    0m11,728s
user    0m11,656s
sys     0m0,047s

Laurent@LAPTOP-LHI8GLRC ~
$ time perl hofstadter.pl 200
Female sequence:
124

real    1m29,175s
user    1m28,984s
sys     0m0,030s

As it can be seen, execution times grow very fast even for moderate increases of the input value: 0.86 sec for 100, 11.7 sec for 1500 and 89 sec for 200. The figures are typical of an exponential complexity (the durations are more or less multiplied by 10 each time you add 50 to the input value). This isn't gonna work if you want significantly larger input values.

Solving the Performance Problem with the Memoize Module

For larger input values, it would be necessary to cache the intermediate results to avoid recomputing them again and again. For this, we can use Mark Jason Dominus's Memoize core module, which does the caching (or, more precisely, memoization) for you with very little effort:

#!/usr/bin/perl
use strict;
use warnings;
use feature "say";
no warnings "recursion";
use Memoize;

memoize('female', 'male');

sub female {
    my $n = shift;
    return 1 if $n == 0;   # base case
    return $n - male (female ($n - 1));
}
sub male {
    my $n = shift;
    return 0 if $n == 0;   #base case
    return $n - female (male ($n - 1));
}
say "Female sequence: ";
say female shift;

Now the execution times become small (less than 0.1 sec for an input value of 200) and no longer grow exponentially:

$ time perl hofstadter.pl 100
Female sequence:
62

real    0m0,072s
user    0m0,015s
sys     0m0,015s

Laurent@LAPTOP-LHI8GLRC ~
$ time perl hofstadter.pl 150
Female sequence:
93

real    0m0,085s
user    0m0,015s
sys     0m0,046s

Laurent@LAPTOP-LHI8GLRC ~
$ time perl hofstadter.pl 200
Female sequence:
124

real    0m0,093s
user    0m0,046s
sys     0m0,030s

For an input value of 200, the program now runs almost 1,000 times faster. Even with an input value of 1000, the run time is less than a second (real: 0,829s).

Of course, it wouldn't be very complicated to do the caching yourself, but I wanted to use the opportunity to show this beautiful Memoize module, which so simple to use (I've just added two very simple code lines) and so efficient. Take a look at the Perl 6 version below if you want some ideas on how to do the caching manually.

Hofstadter Female and Male Sequences in Perl 6

Very little change (basically only the introduction of subroutine signatures) is required to translate the first P5 version into Perl 6:

use v6;

sub female (UInt:D $n) {
    return 1 if $n == 0;   # base case
    return $n - male (female ($n - 1));
}
sub male (UInt:D $n) {
    return 0 if $n == 0;   #base case
    return $n - female (male ($n - 1));
}
say "Female sequence:";
printf "%d ", female $_ for 0..30;
say "";
say "Male sequence:";
printf "%d ", male $_ for 0..30;

This displays the same as the P5 version:

Female sequence:
1 1 2 2 3 3 4 5 5 6 6 7 8 8 9 9 10 11 11 12 13 13 14 14 15 16 16 17 17 18 19
Male sequence:
0 0 1 2 2 3 4 4 5 6 6 7 7 8 9 9 10 11 11 12 12 13 14 14 15 16 16 17 17 18 19

Using Multi Subs for Dealing With the Base Case

Perl 6 has the notion of multi subs, which might be used for dealing with the base case needed to stop recursion. Multi subs are subroutines with the same name but a different signature to deal with different cases.

multi sub female (0) { 1; }   # base case
multi sub female (UInt:D $n) {
    return $n - male (female ($n - 1));
}
multi sub male (0) { 0; }    # base case
multi sub male (UInt:D $n) {
    return $n - female (male ($n - 1));
}
say "Female sequence:";
printf "%d ", female $_ for 0..30;
say "";
say "Male sequence:";
printf "%d ", male $_ for 0..30;

This prints the same output as before.

The Performance Problem, Again

Of course, for large input values, we have the same performance problem as in Perl 5, except that it is even worse since Perl 6 is currently significantly slower than Perl 5.

Let's change the script to compute only one female value and measure the time taken:

use v6;

sub female (UInt:D $n) {
    return 1 if $n == 0;   # base case
    return $n - male (female ($n - 1));
}
sub male (UInt:D $n) {
    return 0 if $n == 0;   #base case
    return $n - female (male ($n - 1));
}
sub MAIN (UInt $input) {
    say "Female $input: ", female $input;
    say "Time taken: ", now - INIT now;
}

These are the execution times for input values 50 and 100:

~ perl6 hofstadter.p6 50
Female 50: 31
Time taken: 0.2816245

Laurent@LAPTOP-LHI8GLRC ~
~ perl6 hofstadter.p6 100
Female 100: 62
Time taken: 10.96803473

They are quite bad.

Caching the Intermediate Results for Solving the Performance Problem

I do not think that Mark Jason Dominus's Memoize module has been ported yet to Perl 6. So let's see how we can cache manually the intermediate results.

We are storing intermediate results in the @*female and @*male dynamic-scope arrays. When the result exists in the array, we just return it, and we compute it and store it in the array when the result is not known yet (and implicitly return the result in the array in that case). Note that the two base cases are now handled through the initialization of the @*female and @*male arrays

use v6;

sub female (UInt:D $n) {
    return @*female[$n] if defined @*female[$n];
    @*female[$n] = $n - male (female ($n - 1));
}
sub male (UInt:D $n) {
    return @*male[$n] if defined @*male[$n];
    @*male[$n] = $n - female (male ($n - 1));
}
sub MAIN (UInt $input) {
    my @*female = 1,;
    my @*male = 0,;
    say "Female $input: ", female $input;
    say "Time taken: ", now - INIT now;
}

Now, this is very fast:

~ perl6 hofstadter.p6 50
Female 50: 31
Time taken: 0.0124059

Laurent@LAPTOP-LHI8GLRC ~
~ perl6 hofstadter.p6 100
Female 100: 62
Time taken: 0.02777636

For the female of 1,000, it seems that our manually optimized P6 program runs even much faster than the memoized Perl 5 version:

~ perl6 hofstadter.p6 1000
Female 1000: 618
Time taken: 0.0585017

Once again, point demonstrated: yes, unfortunately, Perl 6 is currently significantly slower than Perl 5, but that may not matter very much if you take the time to optimize your algorithm when possible. The hand-crafted caching done in the program above took about 10 minutes of my time.

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate to this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 6 p.m. BST (British summer time) on Sunday, June 30. And, please, also spread the word about the Perl Weekly Challenge if you can.

Perl Weekly Challenge # 12: Euclid's Numbers and Directories

These are some answers to the Week 12 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in several days from now (June 16, 2019). This blog post offers some solutions to this challenge, please don't read on if you intend to complete the challenge on your own.

Challenge # 1: Euclid's Numbers

The numbers formed by adding one to the products of the smallest primes are called the Euclid Numbers (see wiki). Write a script that finds the smallest Euclid Number that is not prime. This challenge was proposed by Laurent Rosenfeld.

I did not even remember I proposed this challenge to my friend Mohammad Anwar.

So far, in my blogs about the Perl Weekly Challenge, I have always prepared and presented the Perl 5 solutions first and then the Perl 6 solutions, as it seems to be slightly more natural to do it in this order. This time, for a change, I'll do it the other way around and start with a Perl 6 solution. This way, I'll not be tempted to just translate a P5 solution into P6.

Euclid's Numbers in Perl 6

For this, we can use two infinite (lazy) lists: one for the primes and one for Euclid's numbers, and then pick up the first Euclid's number that is not prime:

use v6;

my @primes = grep {.is-prime}, 1..*;
my @euclids = map {1 + [*] @primes[0..$_]}, 0..*;
say @euclids.first(not *.is-prime);

which prints 30031 (which is not prime as it is the product 59 × 509).

Note that we don't really need to populate an intermediate temporary array with Euclid's numbers and can find directly the first such number that is not prime:

use v6;

my @primes = grep {.is-prime}, 1..*;
say (map {1 + [*] @primes[0..$_]}, 0..*).first(not *.is-prime);

But it probably wouldn't make much sense to also try to get rid of the @primes array, because we are in fact using it many times in the process of computing Euclid's numbers, so it is probably better to cache the primes.

Euclid's Numbers in Perl 5

For this challenge, I reused the find_primes and is_prime subroutines that I described in some details in my previous blog post about Perl Weekly Challenge 8 on perfect numbers (and some other earlier posts). Please refer to that blog if you need explanations on these subroutines. Once you have these subroutines to generate a list of prime numbers, generating a list of Euclid's numbers and checking whether each generated Euclid's number is prime is straight forward:

#!/usr/bin/perl
use strict;
use warnings;
use feature "say";
use constant largest_num => 1000;

sub find_primes {
    my $num = 3;
    my @primes = (2, 3);
    while (1) {
        $num += 2;     # check only odd numbers
        last if $num > largest_num;
        my $limit = int $num ** 0.5;
        my $num_is_prime = 1;
        for my $prime (@primes) {
            last if $prime > $limit;
            if ($num % $prime == 0) {
                # $num evenly divided by $prime, exit the for loop
                $num_is_prime = 0;
                last;
            }
        }
        push @primes, $num if $num_is_prime;
    }
    return @primes;
}

my @prime_numbers = find_primes;  

sub is_prime {
    my $num = shift;
    my $limit = 1 + int $num ** 0.5;
    for my $p (@prime_numbers) {
        return 1 if $p > $limit;
        return 0 if $num % $p == 0;
    }
    warn "Something got wrong (primes list too small)\n";
    return 0; # If we've reached this point, then our list of 
              # primes is too small, we don't know if the argument
              # is prime, issue a warning and return a false 
              # value to be on the safe side of things
}

for my $i (0..20) {
    my $euclid_1 = 1;
    $euclid_1 *= $prime_numbers[$_] for 0..$i;
    my $euclid = $euclid_1 + 1;
    say $euclid and last unless is_prime $euclid;   
}

The program displays the following output:

$ perl euclid.pl
30031

Common Directory Paths

Write a script that finds the common directory path, given a collection of paths and directory separator. For example, if the following paths are supplied:

/a/b/c/d
/a/b/cd
/a/b/cc
/a/b/c/d/e

and the path separator is /. Your script should return /a/b as common directory path.

Common Directory Paths in Perl 6

For this, I created the compare-paths subroutine to compare two paths, and then use the reduce built-in function to apply compare-paths to the whole list of paths:

use v6;
sub compare-paths ($a, $b) {
    join $*sep, 
        gather for $a.split($*sep) Z $b.split($*sep) -> ($p, $q) {
            last unless $p eq $q;
            take $p;
        }
}
my $*sep = '/';
my @paths = </a/b/c /a/b/c/e /a/b/c/d/e /a/b/c/f>;
say reduce &compare-paths, @paths;

which duly displays /a/b/c.

The compare-paths subroutine splits both paths on the separator, uses the "zip" operator to create pairs of path parts and checks which parts are equal. The gather/take construct picks up the parts that are the same and returns the corresponding path.

Another way to solve the challenge would be to create a new compare-paths operator and use the [] reduction meta-operator to generate the result:

use v6;
sub infix:<compare-paths> ($a, $b) {
    join $*sep, 
        gather for $a.split($*sep) Z $b.split($*sep) -> ($p, $q) {
            last unless $p eq $q;
            take $p;
        }
}
my $*sep = '/';
my @paths = </a/b/c /a/b/c/e /a/b/c/d/e /a/b/c/f>;
say [compare-paths] @paths;

Common Directory Paths in Perl 5

Here is a way to do it in Perl 5:

#!/usr/bin/perl
use strict;
use warnings;
use feature "say";

die "This program needs a separator and at least 2 paths\n" 
    if @ARGV < 3;
my ($separator, @paths) = @ARGV;
chomp @paths;
my @common_path = split $separator, shift @paths;
for my $new_path (@paths) {
    my @new_path_pieces = split $separator, $new_path;
    my $min_length = @new_path_pieces < @common_path ? 
        @new_path_pieces : @common_path;
    for my $i (0..$min_length - 1) {
        if ($common_path[$i] ne $new_path_pieces[$i]) {
            @common_path = @common_path[0..$i-1];
            last;
        }
    }
}
say join $separator, @common_path;

Note, however, that the List::Util core module also provides a reduce subroutine making it possible to create a solution similar to the P6 solution:

#!/usr/bin/perl
use strict;
use warnings;
use feature "say";
use List::Util qw/reduce/;

sub compare {
    my ($sep, $p1, $p2) = @_;
    my @path1 = split /$sep/, $p1;
    my @path2 = split /$sep/, $p2;
    my $min_length = @path1 < @path2 ? @path1 : @path2;
    for my $i (0..$min_length - 1) {
        if ($path1[$i] ne $path2[$i]) {
            return join $sep, @path1[0..$i-1];
        }
    }
    return join $sep, @path1[0..$min_length - 1];
}

die "This program needs a separator and at least 2 paths\n" 
    if @ARGV < 3;
my ($separator, @paths) = @ARGV;
chomp @paths;
say reduce {compare($separator, $a, $b)} @paths;

The Perl 5 solution is still much less concise than the Perl 6 solution.

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate to this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 6 p.m. BST (British summer time) on Sunday, June 23. And, please, also spread the word about the Perl Weekly Challenge if you can.

Perl Weekly Challenge # 11: Fahrenheit Celsius and Identity Matrix

These are some answers to the Week 11 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in several days from now (June 9, 2019). This blog post offers some solutions to this challenge, please don't read on if you intend to complete the challenge on your own.

Challenge # 1: Fahrenheit and Celsius Scales Intersection

Write a script that computes the equal point in the Fahrenheit and Celsius scales, knowing that the freezing point of water is 32 °F and 0 °C, and that the boiling point of water is 212 °F and 100 °C. This challenge was proposed by Laurent Rosenfeld.

While it is very easy to do the math by hand, find that $Tf = 9/5 * $Tc + 32 and then solve the linear equation x = 9/5 x + 32, it turns out that it is a little bit more complex to write a program doing that with a general purpose programming language. I guess it is probably easy with symbolic or algebraic computation software such as Maple, Mathematica, or Scilab. I'm working on a solution that might do that, but I'm not sure when it will be ready (and even whether it will work out the way I hope). Here we'll use successive numerical approximations.

Fahrenheit Celsius in Perl 5

In the code below, the $speed_diff variable measures the difference of variation between the two scales. It is the coefficient of the slope of the straight line representing one scale compared to another in a graphic representation (and its value is 9/5, or 1.8). In other word, if you move on the Celsius scale by 1 °, you have to move on the Fahrenheit scale by 1.8 °.

use strict;
use warnings;
use feature qw/say/;

my %freeze = (F => 32, C => 0);
my %boil = (F => 212, C => 100);
my $speed_diff = ($boil{F} - $freeze{F} ) /($boil{C} - $freeze{C} ) ;

my $c = $freeze{C};
my $f = $freeze{F};

while (abs($c - $f) > 0.2) {
    $f += ($c - $f)/2;
    $c +=  ( ($c - $f) / $speed_diff );
    # say "Approx: $c $f ", $c-$f; # De-comment this line to see the intermediate results 
} 
printf "Fahrenheit and Celsius scales meet at: %.0f\n", $c;

The program displays the following final result (after about 20 iterations):

$ perl Fahrenheit.pl
Fahrenheit and Celsius scales meet at: -40

Note that we use two hashes to store the input values without hard-coding them, but we could have used 4 variables.

Fahrenheit Celsius in Perl 6

Translating the same program in Perl 6 is easy:

use v6;

#`{ Write a script that computes the equal point in the Fahrenheit 
    and Celsius scales, knowing that the freezing point of water 
    is 32 °F and 0 °C, and that the boiling point of water is 212 °F 
    and 100 °C.
  }


my %freeze = (F => 32, C => 0);
my %boil = (F => 212, C => 100);
my $speed_diff = (%boil<F> - %freeze<F> ) / (%boil<C> - %freeze<C> );

my $c = %freeze<C>;
my $f = %freeze<F>;

while abs($c - $f) > 0.2 {
    $f += ($c - $f)/2;
    $c +=  ( ($c - $f) / $speed_diff );
    # say "Approx: $c $f ", $c-$f;
} 
say "Fahrenheit and Celsius scales meet at: ", $c.fmt("%.0f");

And we obtain the same value as in P5 (-40, i.e. -40 °C and -40 °F).

Note that Perl 6 has an approximately-equal operator (=~=) that could have been used in the while loop for comparing $c and $f, but using it would require to first set the tolerance value, so the benefit is limited when used only once. This part of the program could look like this:

{
    my $*TOLERANCE = .01;
    while not $c =~= $f  {
        $f += ($c - $f)/2;
        $c +=  ( ($c - $f) / $speed_diff );
        # say "Approx: $c $f ", $c-$f; 
    } 
}

The $*TOLERANCE value is relative tolerance, not absolute tolerance, so I had to use another value (1%) than the absolute tolerance hard-coded in the original while loop. In brief, using this functionality does not simplify the code for the case in point.

Challenge # 2: Identity Matrix

Write a script to create an Identity Matrix for the given size. For example, if the size is 4, then create Identity Matrix 4x4. For more information about Identity Matrix, please read the Wikipedia page.

An identity matrix is a square matrix with ones on the main diagonal (top left to bottom right) and zeroes everywhere else.

Identity Matrix in Perl 5

Let's start with a boring plain-vanilla solution using two nested loops, as I would probably do in C, Pascal, Ada, or Java:

use strict;
use warnings;
use feature qw/say/;
use Data::Dumper;

my $size = shift() - 1;
my @matrix;
for my $i (0..$size) {
    for my $j (0..$size-1) {
        if ($i == $j) {
            $matrix[$i][$j] = 1;
        } else {
            $matrix[$i][$j] = 0;
        }
    }
}
say Dumper \@matrix;

When the row index ($i) is equal to the column index ($j), we populate the item with a one. Otherwise, we fill with zeroes.

This program displays the following:

$ perl matrix.pl 3
$VAR1 = [
          [
            1,
            0,
            0
          ],
          [
            0,
            1,
            0
          ],
          [
            0,
            0,
            1
          ]
        ];

One problem here is that the Data::Dumper module is good to show the data structure in general, but it requires here some careful check to see that this is an identity matrix.

Let's improve our script by adding a subroutine to pretty print our matrix. While we are at it, let's also simplify the code by replacing the if ... else conditional with a ternary operator:

use strict;
use warnings;
use feature qw/say/;

sub pretty_print {
    my $matrix_ref = shift;
    for my $rows (@$matrix_ref) {
        say join " ", @$rows;
    }
}

my $size = shift() - 1;
my @matrix;
for my $i (0..$size) {
    for my $j (0..$size) {
        $matrix[$i][$j] = $i == $j ? 1 : 0;
    }
}
pretty_print \@matrix;

Now, the output really looks like an identity matrix:

$ perl matrix.pl 4
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1

We could further simplify the code of the nested loops:

for my $i (0..$size) {
    $matrix[$i][$_] = $i == $_ ? 1 : 0 for 0..$size;
}

But that simplification is only mildly interesting. Perhaps, we can do better with the functional paradigm.

Identity Matrix in Functional Perl 5

In my Perl Weekly Challenge # 9: Square Numbers and Functional Programming in Perl blog post on May 26, 2019, I showed some ways to use the functional programming style both in Perl 5 and Perl 6, especially the data stream or data pipeline model. Remember especially that map takes a list of items (e.g. an array) as parameter, applies some transformation to each item, and returns the list of modify items.

Let's use the same techniques for the identity matrix challenge:

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

sub prettify {
    my $matrix = shift;
    return join "\n", map { join " ", @$_ } @$matrix;
}

my $size = shift() - 1;
my @matrix = map { my $i = $_; [map { $i == $_ ? 1 : 0 } 0..$size] }
    0..$size;
say prettify \@matrix;

Note that, since we're aiming at doing functional programming, we have replaced the pretty_print subroutine by a pure function, prettify, with no side effect: it does not print anything, but only returns a printable formatted string, which the user may then decide to print (as we did here), or may possibly do something else, such as storing it into a file for future use.

The script displays the same formatted output as before:

$ perl matrix.pl 5
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1

Identity Matrix in Perl 6

I was initially tempted to provide only a functional implementation in Perl 6, but for the benefit of Perl 6 beginners who might read this blog post, I'll start with the plain-vanilla nested loops as in P5:

use v6;

sub pretty-print (@matrix) {
    for @matrix -> @rows {
        say join " ", @rows;
    }
}
sub MAIN (Int $size where * > 0) {
    my @matrix;
    $size--;
    for 0..$size -> $i {
        for 0..$size -> $j {
            @matrix[$i][$j] = $i == $j ?? 1 !! 0;
        }
    }
    pretty-print @matrix;
}

Here, we are using the MAIN subroutine to process the argument passed to the script (which, in this case, must be a strictly positive integer).

The output is the same as in P5:

~ perl6 matrix.p6 5
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1

And if the user omits to pass a parameter (or passes an invalid one), a convenient usage line is printed to the screen:

~ perl6 matrix.p6
Usage:
  matrix.p6 <size>

Also note that we don't need in Perl 6 the syntax complications associated with references and dereferencing in P5.

Identity Matrix in Functional Perl 6

A functional implementation in P6 may look like this:

use v6;

sub prettify (@matrix) {
    return join "\n", map { join " ", $_}, @matrix;
}
sub MAIN (Int $size where * > 0) {
    my @matrix = map { my $i = $_; map { $i == $_ ?? 1 !! 0 }, 
        0..$size },  0..$size;
    say prettify @matrix;
}

Note that the populating the identity matrix takes only one code line. The rest is really for prettifying the result and printing it.

We could also use data pipelines with chained method invocations:

sub prettify (@matrix) {
    @matrix.map({join(" ",$_)}).join("\n");
}
sub MAIN (Int $size where * > 0) {
    say prettify (1..$size).map( -> $i
         { (1..$size).map( { $_ == $i ?? 1 !! 0 })});
}

although I'm not really convinced this is any better, as it is a bit difficult to get these pesky closing parentheses and curly brackets right.

Wrapping up

There was a third challenge this week: Using Open Weather Map API, write a script to fetch the current weather for an arbitrary city. Note that you will need to sign up for Open Weather Map’s free tier and then wait a couple hours before your API key will be valid. This challenge was proposed by Joelle Maslak. The API challenge is optional but would love to see your solution.

As mentioned in earlier blog posts, I know next to nothing about this kind of topic, so I won't undertake anything on that subject and even less blog about it. Please try this challenge and provide answers if you know more that I do on such topic.

The next week Perl Weekly Challenge is due to start soon. If you're interested in participating in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 6 p.m. BST (British summer time) on Sunday, June 16. And, please, also spread the word about the Perl Weekly Challenge if you can.

Perl Weekly Challenge # 10: Roman Numerals and the Jaro-Winkler Distance

These are some answers to the Week 10 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in several days from now (June 2, 2019). This blog post offers some solutions to this challenge, please don't read on if you intend to complete the challenge on your own.

Challenge # 1: Roman Numerals

Write a script to encode/decode Roman numerals. For example, given Roman numeral CCXLVI, it should return 246. Similarly, for decimal number 39, it should return XXXIX. Checkout Wikipedia page for more information.

Of course, there are some Perl modules on the CPAN to convert from and to Roman numerals, but there wouldn't be any challenge if the idea were to use an existing module.

Most people know more or less how Roman numerals work. They use Latin letters to represent numbers:

|---------------------------------------------------|
| Symbol |  I  |  V  |  X  |  L  |  C  |  D  |  M   |
|---------------------------------------------------|
| Value  |  1  |  5  |  10 |  50 | 100 | 500 | 1000 |
|---------------------------------------------------|

In general, Roman numerals use additive notation: for example, MCLXXIII means 1000 + 100 + 50 + 20 + 3 = 1173. Or, at least, this is so when the symbols are written from left to right in decreasing value order.

If, however, a given symbol has a smaller value than a symbol placed on its right, then this is an example of subtractive notation: in that case, the smaller symbol is subtracted from the one its right. For example, IV means 1 subtracted from 5, i.e. 5 - 1 = 4. Similarly, IX and XC respectively mean 10 - 1 = 9 and 100 - 10 = 90. And MCMXLIX corresponds to 1000 + ( 1000 - 100) + (50 - 10) + (10 - 1) = 1949.

The overall problem, though, is that there is no general standard for Roman numerals. Applying the rules above makes it possible to decode more or less unambiguously any Roman numeral coded according to such aforesaid rules, but there may be several different possible ways to encode a number into a Roman numeral. For example, 99 could be encoded as XCXI or IC (or even XCVIIII or possibly LXXXXVIIII). The first transcription (XCXI) seems to be the most frequent one, so this is the one we will chose when encoding to Roman numerals. Still, IC seems to be a valid Roman numeral for 99, so we will try at least to be able to decode it if we find it.

Note that there is no Roman numeral for zero and the largest possible Roman numeral with the above rules is 3,999.

Roman Numerals in Perl 5

If Roman numerals only had the additive notation, it would be very easy: for converting a Roman numeral, just pick up each of the symbols in turn, add them up, and you're done. The trouble comes with subtractive notation.

So my first idea to decode a Roman numeral was to remove any subtractive notation part from the input Roman numeral and replace it with an additive notation. For example, given the numeral MCIX, I would replace IX with VIIII, thus yielding MCVIIII; it is now very easy to add the symbols' values to find 1009. We can use a series of regex substitutions for that:

sub remove_subtractive {
    my $roman = shift;
    for ($roman) {
        s/IV/IIII/;             # 4
        s/IX/VIIII/;            # 9    
        s/IL/XLVIIII/;          # 49
        s/XL/XXXX/;             # 40 to 49
        s/IC/LXXXXVIIII/;       # 99
        s/XC/LXXXX/;            # 90 to 98
        s/ID/XDVIIII/;          # 499
        s/XD/CDLXXXX/;          # 490 to 499
        s/CD/CCCC/;             # 400 to 499
        s/IM/CMLXXXXVIIII/;     # 999
        s/XM/CMLXXXX/;          # 990 to 998
        s/CM/DCCCC/;            # 900 to 999
    }
    return $roman;
}

sub from_roman {
    my $additive = remove_subtractive uc shift;
    my $arabic = 0; 
    for (split //, $additive){
        $arabic += $roman_table{$_};
    }
    return $arabic;
}

That's of course way too complicated. As soon as I started typing the first few of these regex substitutions in the remove_subtractive subroutine, I knew I wanted to find a better way to decode Roman numerals. I nonetheless completed it, because I wanted to show it on this blog. I also tested it quite thoroughly, and it seems to work properly.

The new idea is to read the symbols one by one from left to right and to add the values, keeping track of the previously seen value. If the current value is larger than the previous value, then we were in a case of a subtractive combination at the previous step, and we need to subtract twice the previous value (once because it is a subtractive combination, and once again because we have previously erroneously added it). That's actually quite simple (see how the code of the from_roman subroutine below is much shorter and simpler than what we had tried above).

For encoding Arabic numerals to Roman numerals, the easiest is to perform integer division with decreasing values corresponding to Roman numerals (i.e. M D C L X V I). For example, suppose we want to encode 2019. We first try to divide by 1,000 (corresponding to M). We get 2, so the start of the string representing the Roman numeral will be MM. Then we continue with the remainder, i.e. 19. We try integer division successively with 500, 100 and 50 and get 0, so don't do anything with the result. Next we try with 10 and get 1, so the temporary result is now MMX. The remainder is 9; if we continue the same way with 9, we would divide by 5, add V to our string, and eventually obtain MMXVIIII, which is a correct (simplistic) Roman numeral for 2019, but not really what we want, since we want to apply the subtractive combination and get MMXIX.

Rather than reprocessing VIIII into IX (we've seen before how tedious this could be with regexes), we can observe that if our list of decreasing Roman values also includes IX (9), then it will work straight without any need to reprocess the result. So, our list of decreasing values corresponding to Roman numperals needs to be augmented with subtractive cases to M CM D CD C XC L XL X IX V IV I (corresponding to numbers 1000, 900, 500, 100, 90, 50, 40, 10, 9, 5, 4, 1). Using this list instead of the original one removes any need for special processing for subtractive combinations: we just need to keep doing integer divisions with the decreasing values and continue the rocessing with the remainder. This what the to_roman subroutine below does.

So our program to convert from and to Roman numerals is as follows:

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;


my %rom_tab = (I => 1,  V => 5, X => 10, L => 50, C => 100, D => 500, M => 1000);

sub from_roman {
    my $roman = uc shift;
    my $numeric = 0;
    my $prev_letter = "M";
    for my $letter (split //, $roman) {
        $numeric -= 2 * $rom_tab{$prev_letter} 
            if $rom_tab{$letter} > $rom_tab{$prev_letter};
        $numeric += $rom_tab{$letter};
        $prev_letter = $letter;
    }
    return $numeric;
}

sub to_roman {
    my $arabic = shift;
    warn "$arabic out of bounds" unless $arabic > 0 and $arabic < 4000;
    my %hash = %rom_tab;
    $hash{$_->[0]} = $_->[1] for (['IV', 4], ['IX', 9], ['XL', 40], 
        ['XC', 90], ['CD', 400], ['CM', 900] );
    my $roman = "";
    for my $key (sort { $hash{$b} <=> $hash{$a} } keys %hash) {
        my $num = int ($arabic / $hash{$key});
        $roman .= $key x $num;
        $arabic -= $hash{$key} * $num; 
    }
    return $roman;
}

say "From Roman to Arabic";
say "$_\t=>\t", from_roman $_ for qw <MM MCM LXXIII XCIII IC XCIX xv>;

my @test_nums = qw <19 42 67 90 97 99 429 498 687 938 949 996 2145 3597>;
say "From Arabic to Roman";
say "$_\t=>\t", to_roman $_ for @test_nums;

say "Some round trips: from Arabic to Roman to Arabic";
say "$_\t=>\t", from_roman to_roman $_ for @test_nums;

say "Sanity check (round trip through the whole range)";
for (1..3999) {
    my $result = from_roman to_roman $_;
    say "Error on $_ " unless $result == $_;
}

And this is the result of the various tests:

$ perl roman.pl
From Roman to Arabic
MM      =>      2000
MCM     =>      1900
LXXIII  =>      73
XCIII   =>      93
IC      =>      99
XCIX    =>      99
xv      =>      15
From Arabic to Roman
19      =>      XIX
42      =>      XLII
67      =>      LXVII
90      =>      XC
97      =>      XCVII
99      =>      XCIX
429     =>      CDXXIX
498     =>      CDXCVIII
687     =>      DCLXXXVII
938     =>      CMXXXVIII
949     =>      CMXLIX
996     =>      CMXCVI
2145    =>      MMCXLV
3597    =>      MMMDXCVII
Some round trips: from Arabic to Roman to Arabic
19      =>      19
42      =>      42
67      =>      67
90      =>      90
97      =>      97
99      =>      99
429     =>      429
498     =>      498
687     =>      687
938     =>      938
949     =>      949
996     =>      996
2145    =>      2145
3597    =>      3597
Sanity check (round trip through the whole range)

Note that the sanity check (the last test) does not print anything because the round trip worked correctly for the whole range between 1 and 3,999 (it would display only anomalies, but did not find any).

Also note that, in the Roman to Arabic conversion, both IC and XCIX return 99, as expected, whereas, in the opposite conversion, 99 returns XCIX.

Roman Numerals in Perl 6

Since we think that we now have a good algorithm to convert to and from Roman numerals (and we don't see any P6 feature to significantly simplify it), we can just translate that into Perl 6:

use v6;

subset Roman-str of Str where $_ ~~ /^<[IVXLCDMivxlcdm]>+$/;

my %rom-tab = < I 1   V 5   X 10   L 50   C 100  D 500  M 1000 
               IV 4  IX 9   XL 40  XC 90  CD 400   CM 900 >;
my @ordered_romans = reverse sort { %rom-tab{$_} }, keys %rom-tab;

sub from-roman (Roman-str $roman) {
    my $numeric = 0;
    my $prev_letter = "M";
    for $roman.uc.comb -> $letter {
        $numeric -= 2 * %rom-tab{$prev_letter} 
            if %rom-tab{$letter} > %rom-tab{$prev_letter};
        $numeric += %rom-tab{$letter};
        $prev_letter = $letter;
    }
    return $numeric;
}

sub to-roman (Int $arabic is copy where  { 0 < $_ < 4000 }) {
    my $roman = "";
    for @ordered_romans -> $key {
        my $num = ($arabic / %rom-tab{$key}).Int;
        $roman ~= $key x $num;
        $arabic -= %rom-tab{$key} * $num; 
    }
    return $roman;
}

Besides the small necessary syntactical adjustments between P5 and P6, there are two or three additional changes. First, I used the subroutine signatures to add some limited form of data validation. For checking input Roman numerals, I created the Roman-str subtype (well, really, a subset) which accepts strings that are made only with the seven letters used in Roman numerals (both lower and upper case). This makes it possible to validate (to a certain extent) the argument passed to the from-roman subroutine. Of course, some strings made of these letters may still be invalid Roman numerals, but, at least, we'll get an exception if we inadvertently pass an Arabic number to it.

Similarly, since, according to our rules, Roman numerals can represent numbers between 1 and 3,999, the signature of the to-roman subroutine only accepts integers larger than 0 and less than 4,000.

Also, since I knew from the beginning that I was going to use the M CM D CD C XC L XL X IX V IV I sequence of Roman numerals in the to-roman subroutine, I initialized the %rom-tab hash with all these values, rather than adding some of them (those with two letters) to the hash afterwards as it was done in the P5 version. This is not a problem since the other subroutine (from-roman) using the hash only looks up single letters.

Testing the Perl 6 Roman Numerals Program

It would be fairly easy to use almost exactly the same tests as in Perl 5;

say "$_\t=>\t", from_roman $_ for <MM MCM LXXIII XCIII IC XCIX xv>;

my @test_nums =  <19 42 67 90 97 99 429 498 687 938 949 996 2145 3597>;
say "From Arabic to Roman";
say "$_\t=>\t", to_roman $_.Int for @test_nums;

say "Some round trips: from Arabic to Roman to Arabic";
say "$_\t=>\t", from_roman to_roman $_.Int for @test_nums;

say "Sanity check (round trip through the whole range)";
for (1..3999) {
    my $result = from_roman to_roman $_;
    say "Error on $_ " unless $result == $_;
}

These tests work well and produce the same output as the one listed above for the Perl 5 program.

But there is a better way of testing, which I introduced in my blog on week 2 of the Perl Weekly Challenge: the Test module. I was hoping at the time that I would be able to provide a more significant example, and this challenge is a perfect opportunity to do that. This enables us to do much more thorough testing, including tests on exceptions (for example for a type mismatch in a subroutine's argument).

This my example testing code:

use Test;
plan 45;

say "\nFrom Roman to Arabic";
for < MM 2000 MCM 1900 LXXIII 73 XCIII 93 IC 99 XCIX 99 xv 15> -> $roman, $arabic {
    is from-roman($roman), $arabic, "$roman => $arabic";
}
isnt from-roman("VII"), 8, "OK: VII not equal to 8";
for <12 foo bar MCMA> -> $param {
    dies-ok {from-roman $param}, "Caught exception OK in from-roman: wrong parameter";
}
say "\nFrom Arabic to Roman";
my %test-nums = map { $_[0] => $_[1] }, (
    <19 42 67 90 97 99 429 498 687 938 949 996 2145 3597> Z 
    <XIX XLII LXVII XC XCVII XCIX CDXXIX CDXCVIII DCLXXXVII 
     CMXXXVIII CMXLIX CMXCVI MMCXLV MMMDXCVII>);
for %test-nums.keys -> $key {
    is to-roman($key.Int), %test-nums{$key}, "$key => %test-nums{$key}";
}
for 0, 4000, "foobar", 3e6 -> $param {
    dies-ok { to-roman $param}, "Caught exception OK in to-roman: wrong parameter";
}
say "\nSome round trips: from Arabic to Roman to Arabic";
for %test-nums.keys.sort -> $key {
    is from-roman(to-roman $key.Int), $key, "Round trip OK for $key";
}
my $upper-bound = 3999;
say "\nSanity check (round trip through the whole range 1 .. $upper-bound range)";

lives-ok {
    for (1..$upper-bound) -> $arabic {
        die "Failed round trip on $arabic" if from-roman(to-roman $arabic) != $arabic;
    }
}, "Passed round trip on the full 1..$upper-bound range";

The second line above says that we're going to run 45 test cases (the last test case, the sanity check round trip, is actually testing 3,999 subcases, but it counts as only 1 case).

The is function test for equality of its first two arguments (and isnt tests reports "ok" is the values are not equal). The dies-ok checks that the code being tested throws an exception (good here to check that invalid subroutine arguments are rejected) and the lives-ok check that the code block being tested does not throw any exception.

These tests produce the following output:

1..45

From Roman to Arabic
ok 1 - MM => 2000
ok 2 - MCM => 1900
ok 3 - LXXIII => 73
ok 4 - XCIII => 93
ok 5 - IC => 99
ok 6 - XCIX => 99
ok 7 - xv => 15
ok 8 - OK: VII not equal to 8
ok 9 - Caught exception OK in from-roman: wrong parameter
ok 10 - Caught exception OK in from-roman: wrong parameter
ok 11 - Caught exception OK in from-roman: wrong parameter
ok 12 - Caught exception OK in from-roman: wrong parameter

From Arabic to Roman
ok 13 - 687 => DCLXXXVII
ok 14 - 97 => XCVII
ok 15 - 938 => CMXXXVIII
ok 16 - 498 => CDXCVIII
ok 17 - 19 => XIX
ok 18 - 429 => CDXXIX
ok 19 - 3597 => MMMDXCVII
ok 20 - 2145 => MMCXLV
ok 21 - 67 => LXVII
ok 22 - 90 => XC
ok 23 - 99 => XCIX
ok 24 - 996 => CMXCVI
ok 25 - 949 => CMXLIX
ok 26 - 42 => XLII
ok 27 - Caught exception OK in to-roman: wrong parameter
ok 28 - Caught exception OK in to-roman: wrong parameter
ok 29 - Caught exception OK in to-roman: wrong parameter
ok 30 - Caught exception OK in to-roman: wrong parameter

Some round trips: from Arabic to Roman to Arabic
ok 31 - Round trip OK for 19
ok 32 - Round trip OK for 2145
ok 33 - Round trip OK for 3597
ok 34 - Round trip OK for 42
ok 35 - Round trip OK for 429
ok 36 - Round trip OK for 498
ok 37 - Round trip OK for 67
ok 38 - Round trip OK for 687
ok 39 - Round trip OK for 90
ok 40 - Round trip OK for 938
ok 41 - Round trip OK for 949
ok 42 - Round trip OK for 97
ok 43 - Round trip OK for 99
ok 44 - Round trip OK for 996

Sanity check (round trip through the whole range 1 .. 3999 range)
ok 45 - Passed round trip on the full 1..3999 range

If you want to know more about testing strategies in Perl 6, please look at the Perl 6 testing tutorial, the Test module documentation, or the Debugging section of Chapter 14 of my Perl 6 book.

Challenge 2: The Jaro-Winkler Distance

Write a script to find the Jaro-Winkler distance between two strings. For more information check the Wikipedia page.

The Jaro–Winkler distance is a string metric measuring an edit distance between two sequences. The lower the Jaro–Winkler distance for two strings is, the more similar the strings are. The score is normalized such that 1 equates to no similarity and 0 is an exact match.

To find the Jaro-Winkler distance between two strings, we need first to find the Jaro similarity. Then we compute the Jaro-Winkler similarity, and the Jaro-Winkler distance is just 1 minus the Jaro-Winkler similarity.

Jaro-Winkler in Perl 5

This is my attempt at computing the Jaro-Winkler distance in Perl 5:

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

sub max { $_[0] > $_[1] ? $_[0] : $_[1] }
sub min { $_[0] > $_[1] ? $_[1] : $_[0] }

sub simj {
    my ($str1, $str2) = @_;
    my $len1 = length $str1;
    my $len2 = length $str2;
    my $dmax = int (max($len1, $len2) / 2) -1;
    my @st1 = split //, $str1;
    my $i = 0;
    my @matches;
    for my $letter (split //, $str2) {
        push @matches, $letter if (grep { $letter eq $_ } 
            @st1[max(0,$i-$dmax)..min($i+$dmax,$len1-1)]);
        $i++;
    }
    my $nb_matches = scalar @matches;
    return 0 if $nb_matches == 0;
    my %matching_letters = map { $_ => 1} @matches;
    my @matches_str1 = grep exists $matching_letters{$_}, 
        split //, $str1;
    my $disorder = 0;
    for my $i (0..$nb_matches-1) {
        $disorder++ if $matches[$i] ne $matches_str1[$i];
    }
    my $transposition = $disorder / 2;
    return ($nb_matches / $len1 + $nb_matches / $len2 + 
        ($nb_matches - $transposition)/$nb_matches) / 3;
}
sub simw {
    my ($str1, $str2) = @_;
    my $p_constant = 0.1;
    my $length_prefix = 0;
    for my $count (0..3) {
        last if substr $str1, $count, 1 ne substr $str2, $count, 1;
        $length_prefix++;
    }
    my $simj = simj $str1, $str2;
    return $simj + $length_prefix * $p_constant * (1 - $simj);
}

my @tests = ( ["FOO", "BAR"], ["CRATE", "TRACE"], 
    ["CRATE", "CRATE"], ["TRACE", "CRATE"], 
    ["CREATE", "TRACT"], ["DWAYNE", "DUANE"], );
for my $word_pair (@tests) {
    my ($w1, $w2) = @$word_pair;
    my $simw = simw $w1, $w2;
    say "Jaro-Winkler distance between $w1 and $w2 is: ", 1 - $simw;
}

Note that the Wikipedia description of the transposition calculation isn't very detailed, so I'm not sure to really understand it. I think that the way it is done here makes sense, but I'm not sure this is really what is required.

With the tests listed at the bottom of the above program, this program displays the following output:

$ perl jaro_dist.pl
Jaro-Winkler distance between FOO and BAR is: 1
Jaro-Winkler distance between CRATE and TRACE is: 0.266666666666667
Jaro-Winkler distance between CRATE and CRATE is: 0
Jaro-Winkler distance between TRACE and CRATE is: 0.266666666666667
Jaro-Winkler distance between CREATE and TRACT is: 0.3
Jaro-Winkler distance between DWAYNE and DUANE is: 0.177777777777778

Keeping in mind the doubt about the calculation of the transposition, the results seem consistent. In particular, two indentical words have a distance of 0 and two completely different words (no common letter) have a distance of 1.

Jaro-Winkler in Perl 6

This is the same algorithm for computing the Jaro-Winkler distance as in Perl 5:

use v6;

sub simjaro (Str $str1, Str $str2) {
    my $len1 = $str1.chars;
    my $len2 = $str2.chars;
    my $dmax = (max($len1, $len2) / 2).Int -1;
    my @st1 = $str1.comb;
    my $i = 0;
    my @matches;
    for $str2.comb -> $letter {
        push @matches, $letter if (grep { $letter eq $_ }, 
            @st1[max(0,$i-$dmax)..min($i+$dmax,$len1-1)]);
        $i++;
    }
    my $nb_matches = @matches.elems;
    return 0 if $nb_matches == 0;
    my %matching_letters = map { $_ => 1}, @matches;
    my @matches_str1 = grep { %matching_letters{$_}:exists }, 
        $str1.comb;
    my $disorder = 0;
    for 0..$nb_matches-1 -> $i {
        $disorder++ if @matches[$i] ne @matches_str1[$i];
    }
    my $transposition = $disorder / 2;
    return ($nb_matches / $len1 + $nb_matches / $len2 + 
        ($nb_matches - $transposition)/$nb_matches) / 3;
}
sub simwinkler (Str $str1, Str $str2) {
    my $p_constant = 0.1;
    my $length_prefix = 0;
    for 0..3 -> $count {
        last if substr $str1, $count, 1 ne substr $str2, $count, 1;
        $length_prefix++;
    }
    my $simj = simjaro $str1, $str2;
    return $simj + $length_prefix * $p_constant * (1 - $simj);
}

my @tests = < FOOB BARF   CRATE TRACE   CRATE CRATE   TRACE CRATE   
    CREATE TRACT   DWAYNE DUANE >;
for @tests -> $w1, $w2 {
    my $simw = simwinkler $w1, $w2;
    say "Jaro-Winkler distance between $w1 and $w2 is: ", (1 - $simw).fmt("\t%.3f");
}

This program displays essentially the same output as the P5 program:

perl6 jano_dist.p6
Jaro-Winkler distance between FOOB and BARF is:         1.000
Jaro-Winkler distance between CRATE and TRACE is:       0.267
Jaro-Winkler distance between CRATE and CRATE is:       0.000
Jaro-Winkler distance between TRACE and CRATE is:       0.267
Jaro-Winkler distance between CREATE and TRACT is:      0.300
Jaro-Winkler distance between DWAYNE and DUANE is:      0.178

Aside from the small syntax adjustments needed between P5 and P6, note the convenient possibility in P6 to iterate over several values of a list at each step:

for @tests -> $w1, $w2 {
    # ...
}

Wrapping up

There was a third challenge this week: Send email using SendGrid API. For more information, visit the official page. This challenge was proposed by Gabor Szabo. The API challenge is optional but would love to see your solution.

As mentioned in earlier blog posts, I know next to nothing about this kind of topic, so I won't undertake anything on that subject and even less blog about it. Please try this challenge and provide answers if you know more that I do on such topic.

The next week Perl Weekly Challenge is due to start very soon. If you're interested in participating in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 6 p.m. BST (British summer time) on next Sunday, June 9. And, please, also spread the word about the Perl Weekly Challenge if you can.

Perl Weekly Challenge # 9: Square Numbers and Functional Programming in Perl

In this other blog post, I provided some answers to Week 9 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Here, I want to use the opportunity of this challenge to illustrate some possibilities of functional programming in Perl (both Perl 5 and Perl 6) using the example of the first challenge of this week..

Challenge: Square Number With At Least 5 Distinct Digits

Write a script that finds the first square number that has at least 5 distinct digits.

A Data Pipeline in Perl 5

One of the solutions I suggested in my above-mentionned blog post was this script:

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

for my $integer (100..1000) {
    my $square = $integer ** 2;
    my @digits = split //, $square;
    my %unique_digits = map {$_ => 1} @digits;
    if (scalar keys %unique_digits >= 5) {
        say "$integer -> $square";
        last;
    }
}

Although this is not the main discriminating feature of functional programming, one of the techniques commonly used in languages such as Lisp and its variants is data-flow programming or data pipeline: we take a list of data items and let them undergo a series of successive transformations to get to the desired result. The map function used above is an example of it: here it takes on its right-hand side a list of digits (the @digits array) as input and produces a list of pairs to populate the %unique_digits hash on the left-hand side. We can go further with this model and avoid these temporary variables.

The whole for loop above can be replaced by just three lines of code:

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

my @squares = map { $_ ** 2 } 100..200;
my @nums = grep { my %unique = map {$_ => 1} split //, $_; scalar keys %unique >= 5 ? 1 : 0} @squares;
say $nums[0];

The first line of real code should be read from right to left: we start with a range of integers (100..200), feed it to a map statement which produces the squares of these integers, and finally use the list thus generated to populate the @squares array. In a map statement, each value of the input list is aliased in turn to $_, so that the code block produces the squares of the input values.

The second line is a little bit more complicated. Basically, it takes the values of the @squares array as input and use the grep function to filter the squares that have 5 distinct digits. The grep code block builds a %unique hash for each number received as a parameter, and returns a true value for the input values that produce a hash with at least 5 items. Finally, values that are filtered are fed into the @num array. And the last line prints the first item of the @nums array, i.e. the first number having at least 5 distinct digits.

I must admit that this is probably not the best example to show the expressive power of data-flow processing. I could have built a simpler example for the purpose of a tutorial. But, on the other hand, it shows that I can do it with a real case imposed from outside.

From now on, we will drop the 4 boiler plate code lines at the script's beginning (the use ... lines) to avoid repetition in each code example, but they are of course necessary in any Perl 5 script (except possibly some simple one-liners).

Going one step further, the whole algorithm to find the first square number with 5 distinct digits can be rewritten as a single data pipeline:

say  +(grep { my %unique = map {$_ => 1} split //, $_; scalar keys %unique >= 5 ? 1 : 0} map { $_ ** 2 } 100..200)[0];

The one-liner solution presented in my other blog is essentially using the same techniques:

$ perl -E 'for (100..1000) { my %h = map {$_ => 1} split //, $_**2; say "$_ -> ", $_**2 and last if scalar %hash >= 5 }'
113 -> 12769

As mentioned in my original blog post, one slight problem with these implementations is that we don't really know in advance how large the range of successive integers needs to be. In that case, it is often better to use an infinite loop (for example while (1) { ... }) and to break out of it when we're done. Here, however, it seemed rather obvious to me that we would find a square with 5 distinct digits relatively quickly, so that for (100..1000) would certainly be a good enough approximation of an infinite range for our purpose.

Another possibility is to create an iterator. That's what we will do next.

Iterators, Closures and Anonymous Code References

Most programmers commonly use iterators, sometimes not knowing that it's called this way. For example, when you read a file line by line with a construct such as:

while (my $line = <$FH>) {
    # do something with $line
}

you're actually using an iterator.

An iterator is a function that returns values and keeps track of the last returned value to find out the next one. What we want here is a function that returns squares one by one, so that we don't need to compute values that are not needed. In our case, we would need a function that "remembers" the last integer it has used to generate the last square (or, alternatively, that remembers the next integer to use). For this, we could simply use a global variable, but that's considered bad practice. Rather, we will use a closure, i.e. a function that knows about the environment in which it was defined. For example, we could do something like this:

{
    my $num = 100;
    sub give_me_a_square {
        $num ++;
        return $num ** 2
    }
}
while (my $square = give_me_a_square()) {
    my %unique = map {$_ => 1} split //, $square;
    if (scalar keys %unique == 5) {
        say $square;
        last;
    }
}

Here, the give_me_a_square subroutine if defined within a block (the pair of curly braces) that creates a lexical scope within which the $num variable is also declared and initialized. Because of that, give_me_a_square "closes over" $num, it is a closure. When we call this subroutine, we are no longer within the scope where $num is defined, but the subroutine "remembers" about $num and about its current value.

Some people believe that closures have to be anonymous function, but this is not true: here, our give_me_a_square closure is a perfectly regular named subroutine. It is true, though, that closures are often anonymous code references, because the ability to pass around code references as an argument to another function or as a return value from a subroutine is part of their expressive power. So, a more canonical implementation of an iterator would use an anonymous code reference:

sub make_square_iterator {
    my $num = shift;
    return sub {
        $num++;
        return $num ** 2;
    }
}
my $square_iter = make_square_iterator 100;
while (my $square = $square_iter->()) {
    my %unique = map {$_ => 1} split //, $square;
    if (scalar keys %unique == 5) {
        say $square;
        last;
    }
}

The main advantage of this implementation over the previous one is that $num is no longer hard coded, but passed as an argument to the make_square_iterator subroutine, which means that we could call it several times with different initial values and generate as many iterators as we want, and each iterator would keep track of its own current value. Here, we need only one, and when make_square_iterator is called, it returns an anonymous subroutine or coderef which the caller stores in the $square_iter variable and calls each time it needs a new square.

The ability to create anonymous subroutines (as coderefs) dynamically is an essential part of Perl's expressive power.

To tell the truth, using an iterator for such a simple problem is a bit of an overkill, but I thought it constituted an interesting example to introduce this powerful technique.

Using a closure is the traditional way to create an iterator in Perl 5 since the beginning of Perl 5 in 1994. And this is what I commonly use at $work on some of our platforms where we are stuck with old versions of Perl. Version 5.10, however, introduced the state declarator which makes it possible to declare persistent private variables that are initialized only once (the first time the code line is executed). This feature needs to be activated, for example with a code line containing the use feature "state"; pragma. Using state variables makes the code of an iterator a bit simpler:

use feature qw/say state/;

sub provide_square {
    state $num = shift;
    return ++$num ** 2;
}
while (my $square = provide_square 100) {
    my %unique = map {$_ => 1} split //, $square;
    if (scalar keys %unique == 5) {
        say $square;
        last;
    }
}

To understand how this code works, remember that the state $num = shift; code line is executed only the first time the provide_square subroutine is called. On the following calls, $num is successively 101, 102, 103, etc.

Square Numbers in Perl 6

A data pipeline in functional style may look like this:

say first /\d+/, grep { 5 <= elems unique comb '', $_ }, map { $_ ** 2}, 100..*;

Note that first used as a functional subroutine needs a regex as a first argument. The /\d+/ isn't really useful for the algorithm, but is needed for first to work properly.

But we can use first with a grep-like syntax to make this more convenient:

say first { 5 <= elems unique comb '', $_ }, map { $_ ** 2}, 100..*;

The data pipeline may also use chained method invocations:

say (100..*).map(* ** 2).grep(*.comb.unique >= 5).first;

Perl 6 also has the ==> feed operator:

my $square = 100...* ==> map { $_ ** 2 } ==> grep(*.comb.unique >= 5)  ==> first /\d+/;
say $square;

or, probably better:

100...* ==> map { $_ ** 2 } ==> first(*.comb.unique >= 5)  ==> say();

There is also the <== leftward feed operator:

say()  <== first(*.comb.unique >= 5) <== map { $_ ** 2} <== 100..*;

We have no reason to try to build an iterator in Perl 6 as we did in Perl 5, since the lazy infinite list mechanism just offers what we need. But we can create an iterator if we want to. This is what it might look like using the state declarator:

sub provide-square (Int $in) {
    state $num = $in;
    return ++$num ** 2;
}
while my $square = provide-square 100 {
    if $square.comb.unique >= 5 {
        say $square;
        last;
    }
}

We could also create an iterator with a closure:

sub create-iter (Int $in) {
    my $num = $in;
    return sub {
        return ++$num ** 2;
    }
}
my &square-iter = create-iter 100;
while my $square = &square-iter() {
    if $square.comb.unique >= 5 {
        say $square;
        last;
    }
}

Acknowledgement

I originally learned about these techniques from Mark Jason Dominus's book, Higher Order Perl, probably the best CS book I've read in the last 15 years or so. The book is available for free on-line, but if you start reading it, you might very well end up buying a paper copy. At least, this is what happened to me, and I'm very happy to own a paper copy of it.

Wrapping up

The next week Perl Weekly Challenge is due to start very soon. If you're interested in participating in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 6 p.m. BST (British summer time) on next Sunday, June 2. And, please, also spread the word about the Perl Weekly Challenge if you can.