May 2019 Archives

Perl Weekly Challenge # 10: Roman Numerals and the Jaro-Winkler Distance

These are some answers to the Week 10 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in several days from now (June 2, 2019). This blog post offers some solutions to this challenge, please don't read on if you intend to complete the challenge on your own.

Challenge # 1: Roman Numerals

Write a script to encode/decode Roman numerals. For example, given Roman numeral CCXLVI, it should return 246. Similarly, for decimal number 39, it should return XXXIX. Checkout Wikipedia page for more information.

Of course, there are some Perl modules on the CPAN to convert from and to Roman numerals, but there wouldn't be any challenge if the idea were to use an existing module.

Most people know more or less how Roman numerals work. They use Latin letters to represent numbers:

|---------------------------------------------------|
| Symbol |  I  |  V  |  X  |  L  |  C  |  D  |  M   |
|---------------------------------------------------|
| Value  |  1  |  5  |  10 |  50 | 100 | 500 | 1000 |
|---------------------------------------------------|

In general, Roman numerals use additive notation: for example, MCLXXIII means 1000 + 100 + 50 + 20 + 3 = 1173. Or, at least, this is so when the symbols are written from left to right in decreasing value order.

If, however, a given symbol has a smaller value than a symbol placed on its right, then this is an example of subtractive notation: in that case, the smaller symbol is subtracted from the one its right. For example, IV means 1 subtracted from 5, i.e. 5 - 1 = 4. Similarly, IX and XC respectively mean 10 - 1 = 9 and 100 - 10 = 90. And MCMXLIX corresponds to 1000 + ( 1000 - 100) + (50 - 10) + (10 - 1) = 1949.

The overall problem, though, is that there is no general standard for Roman numerals. Applying the rules above makes it possible to decode more or less unambiguously any Roman numeral coded according to such aforesaid rules, but there may be several different possible ways to encode a number into a Roman numeral. For example, 99 could be encoded as XCXI or IC (or even XCVIIII or possibly LXXXXVIIII). The first transcription (XCXI) seems to be the most frequent one, so this is the one we will chose when encoding to Roman numerals. Still, IC seems to be a valid Roman numeral for 99, so we will try at least to be able to decode it if we find it.

Note that there is no Roman numeral for zero and the largest possible Roman numeral with the above rules is 3,999.

Roman Numerals in Perl 5

If Roman numerals only had the additive notation, it would be very easy: for converting a Roman numeral, just pick up each of the symbols in turn, add them up, and you're done. The trouble comes with subtractive notation.

So my first idea to decode a Roman numeral was to remove any subtractive notation part from the input Roman numeral and replace it with an additive notation. For example, given the numeral MCIX, I would replace IX with VIIII, thus yielding MCVIIII; it is now very easy to add the symbols' values to find 1009. We can use a series of regex substitutions for that:

sub remove_subtractive {
    my $roman = shift;
    for ($roman) {
        s/IV/IIII/;             # 4
        s/IX/VIIII/;            # 9    
        s/IL/XLVIIII/;          # 49
        s/XL/XXXX/;             # 40 to 49
        s/IC/LXXXXVIIII/;       # 99
        s/XC/LXXXX/;            # 90 to 98
        s/ID/XDVIIII/;          # 499
        s/XD/CDLXXXX/;          # 490 to 499
        s/CD/CCCC/;             # 400 to 499
        s/IM/CMLXXXXVIIII/;     # 999
        s/XM/CMLXXXX/;          # 990 to 998
        s/CM/DCCCC/;            # 900 to 999
    }
    return $roman;
}

sub from_roman {
    my $additive = remove_subtractive uc shift;
    my $arabic = 0; 
    for (split //, $additive){
        $arabic += $roman_table{$_};
    }
    return $arabic;
}

That's of course way too complicated. As soon as I started typing the first few of these regex substitutions in the remove_subtractive subroutine, I knew I wanted to find a better way to decode Roman numerals. I nonetheless completed it, because I wanted to show it on this blog. I also tested it quite thoroughly, and it seems to work properly.

The new idea is to read the symbols one by one from left to right and to add the values, keeping track of the previously seen value. If the current value is larger than the previous value, then we were in a case of a subtractive combination at the previous step, and we need to subtract twice the previous value (once because it is a subtractive combination, and once again because we have previously erroneously added it). That's actually quite simple (see how the code of the from_roman subroutine below is much shorter and simpler than what we had tried above).

For encoding Arabic numerals to Roman numerals, the easiest is to perform integer division with decreasing values corresponding to Roman numerals (i.e. M D C L X V I). For example, suppose we want to encode 2019. We first try to divide by 1,000 (corresponding to M). We get 2, so the start of the string representing the Roman numeral will be MM. Then we continue with the remainder, i.e. 19. We try integer division successively with 500, 100 and 50 and get 0, so don't do anything with the result. Next we try with 10 and get 1, so the temporary result is now MMX. The remainder is 9; if we continue the same way with 9, we would divide by 5, add V to our string, and eventually obtain MMXVIIII, which is a correct (simplistic) Roman numeral for 2019, but not really what we want, since we want to apply the subtractive combination and get MMXIX.

Rather than reprocessing VIIII into IX (we've seen before how tedious this could be with regexes), we can observe that if our list of decreasing Roman values also includes IX (9), then it will work straight without any need to reprocess the result. So, our list of decreasing values corresponding to Roman numperals needs to be augmented with subtractive cases to M CM D CD C XC L XL X IX V IV I (corresponding to numbers 1000, 900, 500, 100, 90, 50, 40, 10, 9, 5, 4, 1). Using this list instead of the original one removes any need for special processing for subtractive combinations: we just need to keep doing integer divisions with the decreasing values and continue the rocessing with the remainder. This what the to_roman subroutine below does.

So our program to convert from and to Roman numerals is as follows:

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;


my %rom_tab = (I => 1,  V => 5, X => 10, L => 50, C => 100, D => 500, M => 1000);

sub from_roman {
    my $roman = uc shift;
    my $numeric = 0;
    my $prev_letter = "M";
    for my $letter (split //, $roman) {
        $numeric -= 2 * $rom_tab{$prev_letter} 
            if $rom_tab{$letter} > $rom_tab{$prev_letter};
        $numeric += $rom_tab{$letter};
        $prev_letter = $letter;
    }
    return $numeric;
}

sub to_roman {
    my $arabic = shift;
    warn "$arabic out of bounds" unless $arabic > 0 and $arabic < 4000;
    my %hash = %rom_tab;
    $hash{$_->[0]} = $_->[1] for (['IV', 4], ['IX', 9], ['XL', 40], 
        ['XC', 90], ['CD', 400], ['CM', 900] );
    my $roman = "";
    for my $key (sort { $hash{$b} <=> $hash{$a} } keys %hash) {
        my $num = int ($arabic / $hash{$key});
        $roman .= $key x $num;
        $arabic -= $hash{$key} * $num; 
    }
    return $roman;
}

say "From Roman to Arabic";
say "$_\t=>\t", from_roman $_ for qw <MM MCM LXXIII XCIII IC XCIX xv>;

my @test_nums = qw <19 42 67 90 97 99 429 498 687 938 949 996 2145 3597>;
say "From Arabic to Roman";
say "$_\t=>\t", to_roman $_ for @test_nums;

say "Some round trips: from Arabic to Roman to Arabic";
say "$_\t=>\t", from_roman to_roman $_ for @test_nums;

say "Sanity check (round trip through the whole range)";
for (1..3999) {
    my $result = from_roman to_roman $_;
    say "Error on $_ " unless $result == $_;
}

And this is the result of the various tests:

$ perl roman.pl
From Roman to Arabic
MM      =>      2000
MCM     =>      1900
LXXIII  =>      73
XCIII   =>      93
IC      =>      99
XCIX    =>      99
xv      =>      15
From Arabic to Roman
19      =>      XIX
42      =>      XLII
67      =>      LXVII
90      =>      XC
97      =>      XCVII
99      =>      XCIX
429     =>      CDXXIX
498     =>      CDXCVIII
687     =>      DCLXXXVII
938     =>      CMXXXVIII
949     =>      CMXLIX
996     =>      CMXCVI
2145    =>      MMCXLV
3597    =>      MMMDXCVII
Some round trips: from Arabic to Roman to Arabic
19      =>      19
42      =>      42
67      =>      67
90      =>      90
97      =>      97
99      =>      99
429     =>      429
498     =>      498
687     =>      687
938     =>      938
949     =>      949
996     =>      996
2145    =>      2145
3597    =>      3597
Sanity check (round trip through the whole range)

Note that the sanity check (the last test) does not print anything because the round trip worked correctly for the whole range between 1 and 3,999 (it would display only anomalies, but did not find any).

Also note that, in the Roman to Arabic conversion, both IC and XCIX return 99, as expected, whereas, in the opposite conversion, 99 returns XCIX.

Roman Numerals in Perl 6

Since we think that we now have a good algorithm to convert to and from Roman numerals (and we don't see any P6 feature to significantly simplify it), we can just translate that into Perl 6:

use v6;

subset Roman-str of Str where $_ ~~ /^<[IVXLCDMivxlcdm]>+$/;

my %rom-tab = < I 1   V 5   X 10   L 50   C 100  D 500  M 1000 
               IV 4  IX 9   XL 40  XC 90  CD 400   CM 900 >;
my @ordered_romans = reverse sort { %rom-tab{$_} }, keys %rom-tab;

sub from-roman (Roman-str $roman) {
    my $numeric = 0;
    my $prev_letter = "M";
    for $roman.uc.comb -> $letter {
        $numeric -= 2 * %rom-tab{$prev_letter} 
            if %rom-tab{$letter} > %rom-tab{$prev_letter};
        $numeric += %rom-tab{$letter};
        $prev_letter = $letter;
    }
    return $numeric;
}

sub to-roman (Int $arabic is copy where  { 0 < $_ < 4000 }) {
    my $roman = "";
    for @ordered_romans -> $key {
        my $num = ($arabic / %rom-tab{$key}).Int;
        $roman ~= $key x $num;
        $arabic -= %rom-tab{$key} * $num; 
    }
    return $roman;
}

Besides the small necessary syntactical adjustments between P5 and P6, there are two or three additional changes. First, I used the subroutine signatures to add some limited form of data validation. For checking input Roman numerals, I created the Roman-str subtype (well, really, a subset) which accepts strings that are made only with the seven letters used in Roman numerals (both lower and upper case). This makes it possible to validate (to a certain extent) the argument passed to the from-roman subroutine. Of course, some strings made of these letters may still be invalid Roman numerals, but, at least, we'll get an exception if we inadvertently pass an Arabic number to it.

Similarly, since, according to our rules, Roman numerals can represent numbers between 1 and 3,999, the signature of the to-roman subroutine only accepts integers larger than 0 and less than 4,000.

Also, since I knew from the beginning that I was going to use the M CM D CD C XC L XL X IX V IV I sequence of Roman numerals in the to-roman subroutine, I initialized the %rom-tab hash with all these values, rather than adding some of them (those with two letters) to the hash afterwards as it was done in the P5 version. This is not a problem since the other subroutine (from-roman) using the hash only looks up single letters.

Testing the Perl 6 Roman Numerals Program

It would be fairly easy to use almost exactly the same tests as in Perl 5;

say "$_\t=>\t", from_roman $_ for <MM MCM LXXIII XCIII IC XCIX xv>;

my @test_nums =  <19 42 67 90 97 99 429 498 687 938 949 996 2145 3597>;
say "From Arabic to Roman";
say "$_\t=>\t", to_roman $_.Int for @test_nums;

say "Some round trips: from Arabic to Roman to Arabic";
say "$_\t=>\t", from_roman to_roman $_.Int for @test_nums;

say "Sanity check (round trip through the whole range)";
for (1..3999) {
    my $result = from_roman to_roman $_;
    say "Error on $_ " unless $result == $_;
}

These tests work well and produce the same output as the one listed above for the Perl 5 program.

But there is a better way of testing, which I introduced in my blog on week 2 of the Perl Weekly Challenge: the Test module. I was hoping at the time that I would be able to provide a more significant example, and this challenge is a perfect opportunity to do that. This enables us to do much more thorough testing, including tests on exceptions (for example for a type mismatch in a subroutine's argument).

This my example testing code:

use Test;
plan 45;

say "\nFrom Roman to Arabic";
for < MM 2000 MCM 1900 LXXIII 73 XCIII 93 IC 99 XCIX 99 xv 15> -> $roman, $arabic {
    is from-roman($roman), $arabic, "$roman => $arabic";
}
isnt from-roman("VII"), 8, "OK: VII not equal to 8";
for <12 foo bar MCMA> -> $param {
    dies-ok {from-roman $param}, "Caught exception OK in from-roman: wrong parameter";
}
say "\nFrom Arabic to Roman";
my %test-nums = map { $_[0] => $_[1] }, (
    <19 42 67 90 97 99 429 498 687 938 949 996 2145 3597> Z 
    <XIX XLII LXVII XC XCVII XCIX CDXXIX CDXCVIII DCLXXXVII 
     CMXXXVIII CMXLIX CMXCVI MMCXLV MMMDXCVII>);
for %test-nums.keys -> $key {
    is to-roman($key.Int), %test-nums{$key}, "$key => %test-nums{$key}";
}
for 0, 4000, "foobar", 3e6 -> $param {
    dies-ok { to-roman $param}, "Caught exception OK in to-roman: wrong parameter";
}
say "\nSome round trips: from Arabic to Roman to Arabic";
for %test-nums.keys.sort -> $key {
    is from-roman(to-roman $key.Int), $key, "Round trip OK for $key";
}
my $upper-bound = 3999;
say "\nSanity check (round trip through the whole range 1 .. $upper-bound range)";

lives-ok {
    for (1..$upper-bound) -> $arabic {
        die "Failed round trip on $arabic" if from-roman(to-roman $arabic) != $arabic;
    }
}, "Passed round trip on the full 1..$upper-bound range";

The second line above says that we're going to run 45 test cases (the last test case, the sanity check round trip, is actually testing 3,999 subcases, but it counts as only 1 case).

The is function test for equality of its first two arguments (and isnt tests reports "ok" is the values are not equal). The dies-ok checks that the code being tested throws an exception (good here to check that invalid subroutine arguments are rejected) and the lives-ok check that the code block being tested does not throw any exception.

These tests produce the following output:

1..45

From Roman to Arabic
ok 1 - MM => 2000
ok 2 - MCM => 1900
ok 3 - LXXIII => 73
ok 4 - XCIII => 93
ok 5 - IC => 99
ok 6 - XCIX => 99
ok 7 - xv => 15
ok 8 - OK: VII not equal to 8
ok 9 - Caught exception OK in from-roman: wrong parameter
ok 10 - Caught exception OK in from-roman: wrong parameter
ok 11 - Caught exception OK in from-roman: wrong parameter
ok 12 - Caught exception OK in from-roman: wrong parameter

From Arabic to Roman
ok 13 - 687 => DCLXXXVII
ok 14 - 97 => XCVII
ok 15 - 938 => CMXXXVIII
ok 16 - 498 => CDXCVIII
ok 17 - 19 => XIX
ok 18 - 429 => CDXXIX
ok 19 - 3597 => MMMDXCVII
ok 20 - 2145 => MMCXLV
ok 21 - 67 => LXVII
ok 22 - 90 => XC
ok 23 - 99 => XCIX
ok 24 - 996 => CMXCVI
ok 25 - 949 => CMXLIX
ok 26 - 42 => XLII
ok 27 - Caught exception OK in to-roman: wrong parameter
ok 28 - Caught exception OK in to-roman: wrong parameter
ok 29 - Caught exception OK in to-roman: wrong parameter
ok 30 - Caught exception OK in to-roman: wrong parameter

Some round trips: from Arabic to Roman to Arabic
ok 31 - Round trip OK for 19
ok 32 - Round trip OK for 2145
ok 33 - Round trip OK for 3597
ok 34 - Round trip OK for 42
ok 35 - Round trip OK for 429
ok 36 - Round trip OK for 498
ok 37 - Round trip OK for 67
ok 38 - Round trip OK for 687
ok 39 - Round trip OK for 90
ok 40 - Round trip OK for 938
ok 41 - Round trip OK for 949
ok 42 - Round trip OK for 97
ok 43 - Round trip OK for 99
ok 44 - Round trip OK for 996

Sanity check (round trip through the whole range 1 .. 3999 range)
ok 45 - Passed round trip on the full 1..3999 range

If you want to know more about testing strategies in Perl 6, please look at the Perl 6 testing tutorial, the Test module documentation, or the Debugging section of Chapter 14 of my Perl 6 book.

Challenge 2: The Jaro-Winkler Distance

Write a script to find the Jaro-Winkler distance between two strings. For more information check the Wikipedia page.

The Jaro–Winkler distance is a string metric measuring an edit distance between two sequences. The lower the Jaro–Winkler distance for two strings is, the more similar the strings are. The score is normalized such that 1 equates to no similarity and 0 is an exact match.

To find the Jaro-Winkler distance between two strings, we need first to find the Jaro similarity. Then we compute the Jaro-Winkler similarity, and the Jaro-Winkler distance is just 1 minus the Jaro-Winkler similarity.

Jaro-Winkler in Perl 5

This is my attempt at computing the Jaro-Winkler distance in Perl 5:

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

sub max { $_[0] > $_[1] ? $_[0] : $_[1] }
sub min { $_[0] > $_[1] ? $_[1] : $_[0] }

sub simj {
    my ($str1, $str2) = @_;
    my $len1 = length $str1;
    my $len2 = length $str2;
    my $dmax = int (max($len1, $len2) / 2) -1;
    my @st1 = split //, $str1;
    my $i = 0;
    my @matches;
    for my $letter (split //, $str2) {
        push @matches, $letter if (grep { $letter eq $_ } 
            @st1[max(0,$i-$dmax)..min($i+$dmax,$len1-1)]);
        $i++;
    }
    my $nb_matches = scalar @matches;
    return 0 if $nb_matches == 0;
    my %matching_letters = map { $_ => 1} @matches;
    my @matches_str1 = grep exists $matching_letters{$_}, 
        split //, $str1;
    my $disorder = 0;
    for my $i (0..$nb_matches-1) {
        $disorder++ if $matches[$i] ne $matches_str1[$i];
    }
    my $transposition = $disorder / 2;
    return ($nb_matches / $len1 + $nb_matches / $len2 + 
        ($nb_matches - $transposition)/$nb_matches) / 3;
}
sub simw {
    my ($str1, $str2) = @_;
    my $p_constant = 0.1;
    my $length_prefix = 0;
    for my $count (0..3) {
        last if substr $str1, $count, 1 ne substr $str2, $count, 1;
        $length_prefix++;
    }
    my $simj = simj $str1, $str2;
    return $simj + $length_prefix * $p_constant * (1 - $simj);
}

my @tests = ( ["FOO", "BAR"], ["CRATE", "TRACE"], 
    ["CRATE", "CRATE"], ["TRACE", "CRATE"], 
    ["CREATE", "TRACT"], ["DWAYNE", "DUANE"], );
for my $word_pair (@tests) {
    my ($w1, $w2) = @$word_pair;
    my $simw = simw $w1, $w2;
    say "Jaro-Winkler distance between $w1 and $w2 is: ", 1 - $simw;
}

Note that the Wikipedia description of the transposition calculation isn't very detailed, so I'm not sure to really understand it. I think that the way it is done here makes sense, but I'm not sure this is really what is required.

With the tests listed at the bottom of the above program, this program displays the following output:

$ perl jaro_dist.pl
Jaro-Winkler distance between FOO and BAR is: 1
Jaro-Winkler distance between CRATE and TRACE is: 0.266666666666667
Jaro-Winkler distance between CRATE and CRATE is: 0
Jaro-Winkler distance between TRACE and CRATE is: 0.266666666666667
Jaro-Winkler distance between CREATE and TRACT is: 0.3
Jaro-Winkler distance between DWAYNE and DUANE is: 0.177777777777778

Keeping in mind the doubt about the calculation of the transposition, the results seem consistent. In particular, two indentical words have a distance of 0 and two completely different words (no common letter) have a distance of 1.

Jaro-Winkler in Perl 6

This is the same algorithm for computing the Jaro-Winkler distance as in Perl 5:

use v6;

sub simjaro (Str $str1, Str $str2) {
    my $len1 = $str1.chars;
    my $len2 = $str2.chars;
    my $dmax = (max($len1, $len2) / 2).Int -1;
    my @st1 = $str1.comb;
    my $i = 0;
    my @matches;
    for $str2.comb -> $letter {
        push @matches, $letter if (grep { $letter eq $_ }, 
            @st1[max(0,$i-$dmax)..min($i+$dmax,$len1-1)]);
        $i++;
    }
    my $nb_matches = @matches.elems;
    return 0 if $nb_matches == 0;
    my %matching_letters = map { $_ => 1}, @matches;
    my @matches_str1 = grep { %matching_letters{$_}:exists }, 
        $str1.comb;
    my $disorder = 0;
    for 0..$nb_matches-1 -> $i {
        $disorder++ if @matches[$i] ne @matches_str1[$i];
    }
    my $transposition = $disorder / 2;
    return ($nb_matches / $len1 + $nb_matches / $len2 + 
        ($nb_matches - $transposition)/$nb_matches) / 3;
}
sub simwinkler (Str $str1, Str $str2) {
    my $p_constant = 0.1;
    my $length_prefix = 0;
    for 0..3 -> $count {
        last if substr $str1, $count, 1 ne substr $str2, $count, 1;
        $length_prefix++;
    }
    my $simj = simjaro $str1, $str2;
    return $simj + $length_prefix * $p_constant * (1 - $simj);
}

my @tests = < FOOB BARF   CRATE TRACE   CRATE CRATE   TRACE CRATE   
    CREATE TRACT   DWAYNE DUANE >;
for @tests -> $w1, $w2 {
    my $simw = simwinkler $w1, $w2;
    say "Jaro-Winkler distance between $w1 and $w2 is: ", (1 - $simw).fmt("\t%.3f");
}

This program displays essentially the same output as the P5 program:

perl6 jano_dist.p6
Jaro-Winkler distance between FOOB and BARF is:         1.000
Jaro-Winkler distance between CRATE and TRACE is:       0.267
Jaro-Winkler distance between CRATE and CRATE is:       0.000
Jaro-Winkler distance between TRACE and CRATE is:       0.267
Jaro-Winkler distance between CREATE and TRACT is:      0.300
Jaro-Winkler distance between DWAYNE and DUANE is:      0.178

Aside from the small syntax adjustments needed between P5 and P6, note the convenient possibility in P6 to iterate over several values of a list at each step:

for @tests -> $w1, $w2 {
    # ...
}

Wrapping up

There was a third challenge this week: Send email using SendGrid API. For more information, visit the official page. This challenge was proposed by Gabor Szabo. The API challenge is optional but would love to see your solution.

As mentioned in earlier blog posts, I know next to nothing about this kind of topic, so I won't undertake anything on that subject and even less blog about it. Please try this challenge and provide answers if you know more that I do on such topic.

The next week Perl Weekly Challenge is due to start very soon. If you're interested in participating in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 6 p.m. BST (British summer time) on next Sunday, June 9. And, please, also spread the word about the Perl Weekly Challenge if you can.

Perl Weekly Challenge # 9: Square Numbers and Functional Programming in Perl

In this other blog post, I provided some answers to Week 9 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Here, I want to use the opportunity of this challenge to illustrate some possibilities of functional programming in Perl (both Perl 5 and Perl 6) using the example of the first challenge of this week..

Challenge: Square Number With At Least 5 Distinct Digits

Write a script that finds the first square number that has at least 5 distinct digits.

A Data Pipeline in Perl 5

One of the solutions I suggested in my above-mentionned blog post was this script:

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

for my $integer (100..1000) {
    my $square = $integer ** 2;
    my @digits = split //, $square;
    my %unique_digits = map {$_ => 1} @digits;
    if (scalar keys %unique_digits >= 5) {
        say "$integer -> $square";
        last;
    }
}

Although this is not the main discriminating feature of functional programming, one of the techniques commonly used in languages such as Lisp and its variants is data-flow programming or data pipeline: we take a list of data items and let them undergo a series of successive transformations to get to the desired result. The map function used above is an example of it: here it takes on its right-hand side a list of digits (the @digits array) as input and produces a list of pairs to populate the %unique_digits hash on the left-hand side. We can go further with this model and avoid these temporary variables.

The whole for loop above can be replaced by just three lines of code:

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

my @squares = map { $_ ** 2 } 100..200;
my @nums = grep { my %unique = map {$_ => 1} split //, $_; scalar keys %unique >= 5 ? 1 : 0} @squares;
say $nums[0];

The first line of real code should be read from right to left: we start with a range of integers (100..200), feed it to a map statement which produces the squares of these integers, and finally use the list thus generated to populate the @squares array. In a map statement, each value of the input list is aliased in turn to $_, so that the code block produces the squares of the input values.

The second line is a little bit more complicated. Basically, it takes the values of the @squares array as input and use the grep function to filter the squares that have 5 distinct digits. The grep code block builds a %unique hash for each number received as a parameter, and returns a true value for the input values that produce a hash with at least 5 items. Finally, values that are filtered are fed into the @num array. And the last line prints the first item of the @nums array, i.e. the first number having at least 5 distinct digits.

I must admit that this is probably not the best example to show the expressive power of data-flow processing. I could have built a simpler example for the purpose of a tutorial. But, on the other hand, it shows that I can do it with a real case imposed from outside.

From now on, we will drop the 4 boiler plate code lines at the script's beginning (the use ... lines) to avoid repetition in each code example, but they are of course necessary in any Perl 5 script (except possibly some simple one-liners).

Going one step further, the whole algorithm to find the first square number with 5 distinct digits can be rewritten as a single data pipeline:

say  +(grep { my %unique = map {$_ => 1} split //, $_; scalar keys %unique >= 5 ? 1 : 0} map { $_ ** 2 } 100..200)[0];

The one-liner solution presented in my other blog is essentially using the same techniques:

$ perl -E 'for (100..1000) { my %h = map {$_ => 1} split //, $_**2; say "$_ -> ", $_**2 and last if scalar %hash >= 5 }'
113 -> 12769

As mentioned in my original blog post, one slight problem with these implementations is that we don't really know in advance how large the range of successive integers needs to be. In that case, it is often better to use an infinite loop (for example while (1) { ... }) and to break out of it when we're done. Here, however, it seemed rather obvious to me that we would find a square with 5 distinct digits relatively quickly, so that for (100..1000) would certainly be a good enough approximation of an infinite range for our purpose.

Another possibility is to create an iterator. That's what we will do next.

Iterators, Closures and Anonymous Code References

Most programmers commonly use iterators, sometimes not knowing that it's called this way. For example, when you read a file line by line with a construct such as:

while (my $line = <$FH>) {
    # do something with $line
}

you're actually using an iterator.

An iterator is a function that returns values and keeps track of the last returned value to find out the next one. What we want here is a function that returns squares one by one, so that we don't need to compute values that are not needed. In our case, we would need a function that "remembers" the last integer it has used to generate the last square (or, alternatively, that remembers the next integer to use). For this, we could simply use a global variable, but that's considered bad practice. Rather, we will use a closure, i.e. a function that knows about the environment in which it was defined. For example, we could do something like this:

{
    my $num = 100;
    sub give_me_a_square {
        $num ++;
        return $num ** 2
    }
}
while (my $square = give_me_a_square()) {
    my %unique = map {$_ => 1} split //, $square;
    if (scalar keys %unique == 5) {
        say $square;
        last;
    }
}

Here, the give_me_a_square subroutine if defined within a block (the pair of curly braces) that creates a lexical scope within which the $num variable is also declared and initialized. Because of that, give_me_a_square "closes over" $num, it is a closure. When we call this subroutine, we are no longer within the scope where $num is defined, but the subroutine "remembers" about $num and about its current value.

Some people believe that closures have to be anonymous function, but this is not true: here, our give_me_a_square closure is a perfectly regular named subroutine. It is true, though, that closures are often anonymous code references, because the ability to pass around code references as an argument to another function or as a return value from a subroutine is part of their expressive power. So, a more canonical implementation of an iterator would use an anonymous code reference:

sub make_square_iterator {
    my $num = shift;
    return sub {
        $num++;
        return $num ** 2;
    }
}
my $square_iter = make_square_iterator 100;
while (my $square = $square_iter->()) {
    my %unique = map {$_ => 1} split //, $square;
    if (scalar keys %unique == 5) {
        say $square;
        last;
    }
}

The main advantage of this implementation over the previous one is that $num is no longer hard coded, but passed as an argument to the make_square_iterator subroutine, which means that we could call it several times with different initial values and generate as many iterators as we want, and each iterator would keep track of its own current value. Here, we need only one, and when make_square_iterator is called, it returns an anonymous subroutine or coderef which the caller stores in the $square_iter variable and calls each time it needs a new square.

The ability to create anonymous subroutines (as coderefs) dynamically is an essential part of Perl's expressive power.

To tell the truth, using an iterator for such a simple problem is a bit of an overkill, but I thought it constituted an interesting example to introduce this powerful technique.

Using a closure is the traditional way to create an iterator in Perl 5 since the beginning of Perl 5 in 1994. And this is what I commonly use at $work on some of our platforms where we are stuck with old versions of Perl. Version 5.10, however, introduced the state declarator which makes it possible to declare persistent private variables that are initialized only once (the first time the code line is executed). This feature needs to be activated, for example with a code line containing the use feature "state"; pragma. Using state variables makes the code of an iterator a bit simpler:

use feature qw/say state/;

sub provide_square {
    state $num = shift;
    return ++$num ** 2;
}
while (my $square = provide_square 100) {
    my %unique = map {$_ => 1} split //, $square;
    if (scalar keys %unique == 5) {
        say $square;
        last;
    }
}

To understand how this code works, remember that the state $num = shift; code line is executed only the first time the provide_square subroutine is called. On the following calls, $num is successively 101, 102, 103, etc.

Square Numbers in Perl 6

A data pipeline in functional style may look like this:

say first /\d+/, grep { 5 <= elems unique comb '', $_ }, map { $_ ** 2}, 100..*;

Note that first used as a functional subroutine needs a regex as a first argument. The /\d+/ isn't really useful for the algorithm, but is needed for first to work properly.

But we can use first with a grep-like syntax to make this more convenient:

say first { 5 <= elems unique comb '', $_ }, map { $_ ** 2}, 100..*;

The data pipeline may also use chained method invocations:

say (100..*).map(* ** 2).grep(*.comb.unique >= 5).first;

Perl 6 also has the ==> feed operator:

my $square = 100...* ==> map { $_ ** 2 } ==> grep(*.comb.unique >= 5)  ==> first /\d+/;
say $square;

or, probably better:

100...* ==> map { $_ ** 2 } ==> first(*.comb.unique >= 5)  ==> say();

There is also the <== leftward feed operator:

say()  <== first(*.comb.unique >= 5) <== map { $_ ** 2} <== 100..*;

We have no reason to try to build an iterator in Perl 6 as we did in Perl 5, since the lazy infinite list mechanism just offers what we need. But we can create an iterator if we want to. This is what it might look like using the state declarator:

sub provide-square (Int $in) {
    state $num = $in;
    return ++$num ** 2;
}
while my $square = provide-square 100 {
    if $square.comb.unique >= 5 {
        say $square;
        last;
    }
}

We could also create an iterator with a closure:

sub create-iter (Int $in) {
    my $num = $in;
    return sub {
        return ++$num ** 2;
    }
}
my &square-iter = create-iter 100;
while my $square = &square-iter() {
    if $square.comb.unique >= 5 {
        say $square;
        last;
    }
}

Acknowledgement

I originally learned about these techniques from Mark Jason Dominus's book, Higher Order Perl, probably the best CS book I've read in the last 15 years or so. The book is available for free on-line, but if you start reading it, you might very well end up buying a paper copy. At least, this is what happened to me, and I'm very happy to own a paper copy of it.

Wrapping up

The next week Perl Weekly Challenge is due to start very soon. If you're interested in participating in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 6 p.m. BST (British summer time) on next Sunday, June 2. And, please, also spread the word about the Perl Weekly Challenge if you can.

Perl Weekly Challenge # 9: Squares and Rankings

These are some answers to the Week 9 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Challenge # 1: Square Number With At Least 5 Distinct Digits

Write a script that finds the first square number that has at least 5 distinct digits. This was proposed by Laurent Rosenfeld.

Again a challenge suggested by me. I swear that I did not try to solve any of the challenge proposals I sent to Mohammad before submitting these proposals to him. Even the special case of the perfect number challenge of last week is no exception, since, as I explained in my blog post on it, while it stemmed from a CS course assignment of 28 years ago that I solved at the time, the requirement I suggested was markedly different and significantly more difficult.

In the case of the challenge of this week, I think it is a quite simple one, which is IMHO fair, since my understanding is that Mohammad intended each week's first assignment to be "beginner oriented" (even though there may have been a couple of cases where the first challenge wasn't so easy).

There might be a slight ambiguity in the question (please don't tell me, I know I'm guilty for that). I consider that we want at least 5 distinct digits, but don't care if some of the digits have duplicates. For example, in my view, 105625 is the square of 325 and has at least 5 distinct digits and thus qualifies as a "square number that has at least 5 distinct digits" (except, of course, that it isn't the first one, but it would be a valid answer if it happened to be the first one). As it turns out, this possible ambiguity is immaterial, since the first number satisfying the requirement has only 5 digits anyway (and therefore no duplicate). The point, though, is that our code doesn't need to care about possible duplicate digits, provided we can count at least 5 distinct digits.

Anyway, let's get around to it without further ado.

Square Numbers in Perl 5

We need square numbers with 5 digits, so we'll loop on successive integers from 100 on and compute their square (since the squares of smaller integers are bound to have less that 5 digits). Then, the split builtin function will provide the individual digits, which we store in a hash to remove duplicate digits. As soon as the hash has 5 items, we can print the number.

This is simple enough to be done in a one-liner:

$ perl -E 'for (100..1000) { my %h = map {$_ => 1} split //, $_**2; say "$_ -> ", $_**2 and last if scalar %hash >= 5 }'
113 -> 12769

If you prefer to see a real script, here is what it could look like:

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

for my $integer (100..1000) {
    my $square = $integer ** 2;
    my @digits = split //, $square;
    my %unique_digits = map {$_ => 1} @digits;
    if (scalar keys %unique_digits >= 5) {
        say "$integer -> $square";
        last;
    }
}

Working on the Perl 6 version (see below) reminded me that the Perl 5 list::util core module has a uniq function to remove duplicates from a list. Furthermore, in scalar context, it returns the number of elements that would have been returned as a list, which is exactly what we need here. However, this requires a recent version (version 1.45 or above) of List::Util, so this one-liner might not work on your computer if you have older versions and it did not work for me on the first three boxes where I initially tried it (you probably need Perl 5.26 to have the right version of this core module out of the box):

$ perl -MList::Util=uniq -E 'for (100..1000) { say "$_ -> ", $_**2 and last if uniq (split //, $_**2) >= 5}'
113 -> 12769

One slight problem with these implementations is that we don't really know in advance how large the range of successive integer needs to be. In that case, it is often better to use an infinite loop (for example while (1) { ... }). Here, however, it seemed rather obvious to me that we would find a square with 5 distinct digits relatively quickly, so that for (100..1000) would certainly be a good enough approximation of an infinite range for our purpose (and, for some idiosyncratic reasons hard to explain, I tend to like for loops better than while loop).

Another possibility is to create an iterator. We'll cover that in another blog.

Square Numbers in Perl 6

Thinking about adapting the P5 one-liner, my first thought was to use a set instead of a hash to remove duplicate digits, but, just a few seconds later, it came to my mind that there is a built-in unique function to do just that.

$ perl6 -e 'say $_ ** 2 and last if ($_**2).comb.unique >= 5 for 100..*'
12769

Aside from the syntactic adjustments, the important difference is that we don't have to worry about the range upper bound: we just generate a lazy infinite list of successive integers larger than or equal to 100.

This could be also done by generating directly an infinite list of squares:

$ perl6 -e 'say $_ and last if .comb.unique >= 5 for map {$_ **2}, 100..*;'
12769

This is what it might look like if you prefer a full-fledged script:

use v6;

my @squares = map {$_ ** 2}, 100..*;   # lazy infinite list of squares
for @squares -> $square {
    if $square.comb.unique >= 5 {
        say $square;
        last;
    }
}

We could also remove any for loop and if conditional by just building successively two infinite lists:

use v6;

my @squares = map {$_ ** 2}, 100..*;
my @candidates = grep { .comb.unique >= 5}, @squares;
say @candidates[0];

By the way, this idea of using infinite lists can be boiled down to another approach for a one-liner:

$ perl6 -e 'say (grep { .comb.unique >= 5}, map {$_ ** 2}, 100..*)[0];'
12769

Finally, another possible approach is to use chained method invocations:

$ perl6 -e 'say (100..*).map(* ** 2).grep(*.comb.unique >= 5).first;'
12769

Rankings

Write a script to perform different types of ranking as described below:

1. Standard Ranking (1224): Items that compare equal receive the same ranking number, and then a gap is left in the ranking numbers.

2. Modified Ranking (1334): It is done by leaving the gaps in the ranking numbers before the sets of equal-ranking items.

3. Dense Ranking (1223): Items that compare equally receive the same ranking number, and the next item(s) receive the immediately following ranking number.

For more information, please refer to wikipage.

Rankings in Perl 5

The first idea is to write three subroutines to perform each of the rankings. I tried to think about common code that could be shared to avoid repeating one-self, but could not come up with any obvious idea in this direction. So I thought: let's just write the three subroutines, and we'll see later if some things can be factored out.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

sub standard {
    my %scores = @_;
    my ($prev_rank, $prev_val, $rank) = (0, 0, 0);
    say "Rank\tID\tScore";
    for my $key (sort { $scores{$a} <=> $scores{$b} } keys %scores) {
        $rank++;
        if ($scores{$key} > $prev_val) {
            say $rank, "\t$key\t$scores{$key}";
            $prev_rank = $rank;
        } else {
            say $prev_rank, "\t$key\t$scores{$key}";
        }
        $prev_val =  $scores{$key};
    }
}

sub modified {
    my %scores = @_;
    my ($prev_val, $rank) = (0, 0);
    my @buffer;
    say "Rank\tID\tScore";
    for my $key (sort { $scores{$a} <=> $scores{$b} } keys %scores) {
        $rank++;
        if ($scores{$key} > $prev_val) {
            say $rank - 1, $_ for @buffer;
            @buffer = ("\t$key\t$scores{$key}");
        } else {
            push @buffer, "\t$key\t$scores{$key}";
        }
        $prev_val =  $scores{$key};
    }
    say $rank, shift @buffer while @buffer;
}

sub dense { 
    my %scores = @_;
    my ($prev_rank, $prev_val, $rank) = (0, 0, 0);
    say "Rank\tID\tScore";
    for my $key (sort { $scores{$a} <=> $scores{$b} } keys %scores) {
        if ($scores{$key} > $prev_val) {
            $rank++;
            say $rank, "\t$key\t$scores{$key}";
            $prev_rank = $rank;
        } else {
            say $prev_rank, "\t$key\t$scores{$key}";
        }
        $prev_val =  $scores{$key};
    }
}

my %scores = (a => 4, b => 5, c => 3, d => 5, e => 1, f => 4, g => 6, h => 4, i =>6);
say "      Standard";
standard(%scores);
say "\n      Modified";
modified(%scores);
say "\n      Dense";
dense(%scores);

This duly displays the following:

      Standard
Rank    ID      Score
1       e       1
2       c       3
3       a       4
3       h       4
3       f       4
6       d       5
6       b       5
8       g       6
8       i       6

      Modified
Rank    ID      Score
1       e       1
2       c       3
5       a       4
5       h       4
5       f       4
7       d       5
7       b       5
9       g       6
9       i       6

      Dense
Rank    ID      Score
1       e       1
2       c       3
3       a       4
3       h       4
3       f       4
4       d       5
4       b       5
5       g       6
5       i       6

Phew, that seems like a lot of code for such seemingly simple task. Of course, we could certainly make it slightly more concise here or there, but nothing really significant. The code almost looks the same in the three subroutines, but it has enough small differences to make it difficult to take out some code and put it in a separate common subroutine. I have also been pondering about using code references, callback subroutines or a "function factory," but none of these solutions seemed to bring any significant advantage. Since there is no reason to make things more complicated or to use more advanced techniques when doing that does not bring any significant advantage, I'll leave it at that for now.

Rankings in Perl 6

There is one thing that I don't like too much in my P5 solution above: I feel that my subroutines should probably not have printed the rankings, but rather returned a ranking string (or array) to be printed by the caller. I'm feeling too lazy to change it by now in the P5 solution, as this is secondary matter, but, at least, I'll not do that in the Perl 6 solution. Other than that (and leaving aside the small syntax changes), it will be essentially the same solution.

use v6;

sub standard (%scores) {
    my ($prev_rank, $prev_val, $rank, $rankings ) = 0, 0, 0, "";
    for sort {%scores{$_}}, keys %scores -> $key {
        $rank++;
        if (%scores{$key} > $prev_val) {
            $rankings ~= "$rank\t$key\t%scores{$key}\n";
            $prev_rank = $rank;
        } else {
            $rankings ~= "$prev_rank\t$key\t%scores{$key}\n";
        }
        $prev_val =  %scores{$key};
    }
    return $rankings;
}

sub modified (%scores) {
    my ($prev_val, $rank, @rankings, @buffer) = 0, 0;
    for sort {%scores{$_}}, keys %scores -> $key {
        $rank++;
        if (%scores{$key} > $prev_val) {
            push @rankings, ($rank - 1 ~ $_) for @buffer;
            @buffer = ("\t$key\t%scores{$key}");
        } else {
            push @buffer, "\t$key\t%scores{$key}";
        }
        $prev_val =  %scores{$key};
    }
    push @rankings, ($rank ~ $_) for @buffer;
    return join "\n", @rankings;
}

sub dense (%scores) { 
    my ($prev_rank, $prev_val, $rank, $rankings) = 0, 0, 0, "";
    for sort {%scores{$_}}, keys %scores -> $key {
        if (%scores{$key} > $prev_val) {
            $rank++;
            $rankings ~= "$rank\t$key\t%scores{$key}\n";
            $prev_rank = $rank;
        } else {
            $rankings ~= "$prev_rank\t$key\t%scores{$key}\n";
        }
        $prev_val =  %scores{$key};
    }
    return $rankings;
}

my %scores = a => 4, b => 5, c => 3, d => 5, e => 1, f => 4, g => 6, h => 4, i =>6;

my $head = "Rank\tID\tScore";
.say for  "      Standard", $head, standard(%scores);
.say for "\n      Modified", $head, modified(%scores);
.say for "\n      Dense", $head, dense(%scores);

This displays almost the same as the P5 script:

$ perl6 rankings.p6
      Standard
Rank    ID      Score
1       e       1
2       c       3
3       a       4
3       h       4
3       f       4
6       d       5
6       b       5
8       g       6
8       i       6


      Modified
Rank    ID      Score
1       e       1
2       c       3
5       a       4
5       h       4
5       f       4
7       d       5
7       b       5
9       g       6
9       i       6

      Dense
Rank    ID      Score
1       e       1
2       c       3
3       a       4
3       h       4
3       f       4
4       d       5
4       b       5
5       g       6
5       i       6

One syntactic change between P5 and P6 is interesting regarding the use of the sort built-in function. In Perl 6, when the first argument of the sort function is a code block (or a subroutine) taking only one parameter, then that code is not intended to be a comparison block, but a code object implementing the dereference or transformation operation to be applied to the items to be sorted before using the default cmp comparison subroutine. This not only makes the code slightly simpler, but it can also make it faster, since the values thus calculated are cached, so that the derefecencing or transformation is done only once for each value to be sorted (rather than having to do it for each comparison performed during the sort process). In other words, Perl 6 automatically performs a Schwartzian transform on the items to be sorted.

Wrapping up

There was a third challenge this week: Send email using Sparkpost API. For more information, visit the official page. This challenge was proposed by Gabor Szabo. The API challenge is optional and would love to see your solution.

As I already said last week, I know next to nothing about this kind of topic, so I won't undertake anything on that subject and even less blog about it. Please try this challenge and provide answers if you know more that I do on such topic.

As mentioned above, I'll make another blog on the same subject in the next days, covering these challenges with a functional programming approach, in which we will use iterators, closures, function factories and other somewhat exotic ways to solve these challenges.

The next week Perl Weekly Challenge is due to start very soon. If you're interested in participating in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 6 p.m. BST (British summer time) on next Sunday, June 2. And, please, also spread the word about the Perl Weekly Challenge if you can.

Perl Weekly Challenge # 8: Perfect Numbers and Centered Output

These are some answers to the Week 8 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Challenge # 1: Perfect Numbers

Write a script that computes the first five perfect numbers. A perfect number is an integer that is the sum of its positive proper divisors (all divisors except itself). Please check Wiki for more information. This challenge was proposed by Laurent Rosenfeld.

Since Mohammad Anwar revealed that I suggested this challenge, let me add a few things about it. The idea of this challenge comes from a very long memory: the requirement is almost the same as a programming assignment I had during the first year (actually the first few months) of my CS study back in 1991, except for the fact that we were asked to write a program (in Pascal at the time) to find the perfect numbers below 10,000 (i.e. 4 perfect numbers). If a student can do it in the very first months of his or her CS curriculum, then it would certainly be too easy for a challenge aimed at experienced CS professional. This the reason why my suggestion was to find the first 5 perfect numbers (instead of 4) as this makes a real difference: as we will see, finding the fifth perfect number is significantly more difficult that finding the first 4 ones.

Let me also be clear on something: 1 is not a perfect number, because the definition says the sum of all its divisors except itself. The sum of all divisors except itself of 1 is 0; thus, 1 is not a perfect number. The first perfect number is 6 ( = 1 + 2 + 3).

Perfect Numbers in Perl 5

Brute Force Approach

Let's try brute force approach to grasp the problem. I'll start with my assignment of 1991, i.e. finding the perfect numbers below 10,000:

#!/usr/bin/perl
use strict;
use warnings;
use feature "say";

sub divisors {
    my $num = shift;
    my $limit = 1 + int $num / 2;
    my @divisors = (1);
    for my $test_div (2..$limit) {
        push @divisors, $test_div if $num % $test_div == 0;
    }
    return @divisors;
}

for my $num (2..10000) {
    my $sum_div = 0;
    my @divisors = divisors $num;
    for my $div (@divisors) {
        $sum_div += $div;
    }
    say " $num : @divisors" if $sum_div == $num ;
}

Easy, nothing complicated. I made a separate divisors subroutine because we may later want to reuse it. If we run that program, we get the following output:

$ time perl perfect1.pl
6 : 1 2 3
28 : 1 2 4 7 14
496 : 1 2 4 8 16 31 62 124 248
8128 : 1 2 4 8 16 32 64 127 254 508 1016 2032 4064

real    0m2.421s
user    0m2.343s
sys     0m0.062s

Two and a half seconds to find the first four perfect numbers. It probably took at least ten or twenty times that duration when I completed this assignment with the hardware of 1991 (my computer at the time didn't even have a hard disk, just two 5.5-inch floppy drives), but you would still get the result in a relatively reasonable time.

So, perhaps we can try with numbers up to 100,000, maybe we'll find the next perfect number. Well, that does not work. Not only we do not find any new perfect number with this new limit, but with 10 times more numbers to visit, the program takes almost 4 minutes to execute:

$ time perl perfect1.pl
6 : 1 2 3
28 : 1 2 4 7 14
496 : 1 2 4 8 16 31 62 124 248
8128 : 1 2 4 8 16 32 64 127 254 508 1016 2032 4064

real    3m50.514s
user    3m48.265s
sys     0m0.140s

This program really does not scale well: with ten times more numbers to study, it takes more or less 100 more time to run. The problem stems from the fact that we have two nested loops (even if it's a bit hidden by the fact that one of the loops is in the subroutine and the other calling that subroutine). Note that I very well knew that something like this was going to happen, but nonetheless made the test to get actual timings to illustrate the problem. We are not going to continue in this direction and try with a limit of one million, as this would obviously take hours to run (with no guarantee to find the next perfect number).

Triangular Numbers

When we were given this assignment back in 1991, not only was the hardware much slower than today, but the Web did not exist (well, it had just been created but was still an internal tool of the CERN, it went public only in 1993) and there certainly wasn't something like Wikipedia (which was created in 2001) to find additional information about perfect numbers. Our professor had been quite wise to ask only for the first four perfect numbers.

Looking at the Wikipedia page on perfect numbers, we can find a lot on interesting information on perfect numbers. First, all known perfect numbers are even (but it is not known whether there are odd perfect numbers, it can only be said that none has been found so far). Assuming that we are looking only for even numbers will cut the duration by half, but that's far from sufficient to solve the performance problem.

The article provides an additional piece of information: all even perfect numbers are triangular numbers. This is a list of triangular numbers:

0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55, 66, 78, 91, 105, ..., 1225, 1275, 1326, 1378,

You start with 0. Add 1 to it to get the next one (1). Add 2 to second one to get the third one (3). Add 3 to the third one (6). And so on. This is interesting because there becoming increasingly sparce and we'll need to check the divisors of much fewer numbers.

So, the forloop at the end of the previous program is now changed as follows:

my $triangular_num = 1;
for my $num (2..10000) {
    $triangular_num += $num;
    my $sum_div = 0;
    my @divisors = divisors $triangular_num;
    for my $div (@divisors) {
        $sum_div += $div;
    }
    say " $triangular_num : @divisors" if $sum_div == $triangular_num;
}

This is indeed much faster than before, and I was able to get the fifth perfect number after about 45 minutes:

$ time perl perfect.pl
6 : 1 2 3
28 : 1 2 4 7 14
496 : 1 2 4 8 16 31 62 124 248
8128 : 1 2 4 8 16 32 64 127 254 508 1016 2032 4064
33550336 : 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8191 16382 32764 65528 131056 262112 524224 1048448 2096896 4193792 8387584 16775168

and it took more than two hours to complete to the last value of the for loop (without finding any additional perfect number).

So, mission accomplished? Well, yes, in a way, we've got our first five perfect numbers, but I'm not really satisfied with this solution: this is way too long.

Mersenne Numbers

The Mersenne number are powers of 2 minus one. More precisely, for every integer number n, the corresponding Mersenne number is the number (2 ** n) -1. Their name is associated with Marin Mersenne, a French monk living in the seventeenth century. These numbers had been studied long before, notably in antiquity by Greek mathematician Euclid of Alexandria, but Mersenne studied them in detail in the context of primality of large numbers, and he may have demonstrated that a Mersenne number (2 ** n) -1 can be prime only if n is also prime. The opposite is not true: for example, 11 is prime, but (2 ** 11) -1 is not prime. Mersenne prime numbers are still very important in the search of very large prime numbers: as of 2018, the seven largest known prime numbers are Mersenne primes.

Euclid discovered and demonstrated an interesting property about the Mersenne primes (although they were obviously not called like that at the time) and perfect numbers: if a Mersenne number (2 ** p) -1 is prime (which requires p to be also prime), then the number ((2 ** p) -1) * 2 ** (p - 1) is a perfect number. In the eighteenth century, Swiss mathematician Leonhard Euler proved conversely that every even perfect number has this form. This is sometimes known as the Euclid Euler theorem.

This can give us a much faster way to identify perfect numbers: compute Mersenne numbers (2 ** p) -1 with p prime, check if such Mersenne number is itself prime, and if it is, that we know that ((2 ** p) -1) * (2 ** (p - 1)) is a perfect number.

For this, we'll write two suboutines, find_primes, which will populate an array of prime numbers, and is_prime, which will determine if its argument is a prime numbers. Contrary to what some readers might expect, is_prime is not going to be used to populate the list of prime numbers, but things will happen the other way around: is_prime will use the list of primes found by the first subroutine. This may seem counter-intuitive, but how do you find out if a number is prime? The basic test is to try to divide it by all numbers smaller than it and if it is not evenly divided by any of these smaller numbers, then it is prime. For a large number, this can take quite a bit of time. This basic idea can be improved very significantly in two ways. First, we don't need to test even division by all integers smaller than the target number being tested, but we can limit the test to numbers smaller than the square root of the target number. So, for example, if you want to know whether 10,001 is prime, you don't need to test divisibility by all numbers smaller than 10,001, but only by numbers smaller than 100, a gain of a factor of 100. Second, you don't need to test all numbers smaller than the square root of the target number. For example, if you start by trying to divide it by 2, you'll find that 10,001 is not evenly divided by 2. This also means that the target number is not evenly divided by any multiple of 2, so when you continue you can skip the test for all even numbers, 2, 4, 6, 8, 10, etc. Similarly, the next test would be to try to divide by 3. Once you've found that 10,001 is not divisible by 3, you can also skip test with multiples of 3, i.e. 6, 9, 12, 15, etc. This method is called the sieve of Eratosthenes. Pushing the argument further, what you actually need is to test divisibility by prime numbers smaller than the square root of the target number.

So, for our is_prime subroutine to be efficient with a large number, we need to have first a list of all prime numbers smaller than the square root of that larger number we're likely to investigate. Creating this list will take some time, but it will make the is_prime subroutine much faster, and this will be a net gain if we're going to test primality of a relatively large quantity of large numbers. The is_prime subroutine is doing just that: testing divisibility by all primes smaller than the square root of the target number. Note that there are some more efficient primality tests, such as the Miller-Rabin test (or, in the case of Mersenne numbers, the Lucas–Lehmer primality test), but implementing such tests is beyond the scope of this blog post.

The find_primes subroutine uses the same principle, with an added small tweak: it populates the @primes array and use it at the same time, i.e. it uses the list of primes already generated to find out whether a new number is a prime. This can be a bit tricky for very small numbers (especially in view of the fact that 2 is a very special case, as it is the only number that is both even and prime), so we pre-populate the @primes array with the first three primes (2, 3 and 5) and start from there. This subroutine has been previously used and described in a previous Perl Weekly challenge, so you can check there if you want more information on it.

Once we have these two subroutines, we just need to generate Mersenne numbers in the form (2 ** p) -1, with p prime, check whether said Merssenne number is prime, and, if such is the case, generate the ((2 ** p) -1) * 2 ** (p - 1) perfect number associated with it.

The following program finds the first eitght perfect numbers in about 0.3 second:

#!/usr/bin/perl
use strict;
use warnings;
use feature "say";
use constant largest_num => 100000;

sub find_primes {
    my $num = 5;
    my @primes = (2, 3, 5);
    while (1) {
        $num += 2;     # check only odd numbers
        last if $num > largest_num;
        my $limit = int $num ** 0.5;
        my $num_is_prime = 1;
        for my $prime (@primes) {
            last if $prime > $limit;
            if ($num % $prime == 0) {
                # $num evenly divided by $prime, exit the for loop
                $num_is_prime = 0;
                last;
            }
        }
        push @primes, $num if $num_is_prime; #  Found a new prime, add it to the array of primes
    }
    return @primes;
}

my @prime_numbers = find_primes;  
# print "@prime_numbers \n";

sub is_prime {
    my $num = shift;
    my $limit = 1 + int $num ** 0.5;
    for my $p (@prime_numbers) {
        return 1 if $p > $limit;
        return 0 if $num % $p == 0;
    }
    warn "Something got wrong (primes list too small)\n";
    return 0; # If we've reach this point, then our list of primes is 
              # too small, we don't now if the argument is prime, return
              # a false value to be on the safe side of things
}

my @perfect_nums;
for my $prime (@prime_numbers) {
    my $mersenne = 2 ** $prime - 1;
    if (is_prime $mersenne) {
        say "$prime $mersenne ", $mersenne * (2 ** ($prime - 1) );
        push @perfect_nums, $mersenne * (2 ** ($prime - 1) );
        last if scalar @perfect_nums >= 8;
    }
}

say "Perfect numbers: @perfect_nums";

The program displays the following information:

$ time perl perfect.pl
2 3 6
3 7 28
5 31 496
7 127 8128
13 8191 33550336
17 131071 8589869056
19 524287 137438691328
31 2147483647 2305843008139952128
Perfect numbers: 6 28 496 8128 33550336 8589869056 137438691328 2305843008139952128

real    0m0.312s
user    0m0.171s
sys     0m0.077s

The first lines display the p number used for generating the Mersenne number, the Mersenne number itself, and the perfect number corresponding to it. We can see that we checked Mersenne numbers until prime p equal to 31, that Mersenne numbers for primes 11, 23 and 29 have not been retained (they are not prime), and that the eighth perfect number has 19 digits. And the program ran in less than a third of a second.

I have the feeling that it will probably not possible to go further, as the next perfect number has 37 digits, which is probably beyond the integer precision in Perl. We could do it with the Math::BigInt or bigint module (but we might encounter again some long run times). We will try to do that in Perl 6.

Perfect Numbers in Perl 6

The First Four Perfect Numbers in Perl 6

Even though we know by now that the brute force approach will easily yield the first four perfect numbers, but not the fifth one, let's implement our old 1991 CS course assignment in Perl 6, if only for the sake of comparing with Perl 5.

use v6;

sub divisors (Int $num) {
    return 1, | grep { $num %% $_ }, 2 .. (1 + ($num / 2).Int);
}

for 2..10000 -> $num {
    my @divisors = divisors $num;
    say "$num: ", @divisors if $num == [+] @divisors;
}

which produces the following output:

6: [1 2 3]
28: [1 2 4 7 14]
496: [1 2 4 8 16 31 62 124 248]
8128: [1 2 4 8 16 32 64 127 254 508 1016 2032 4064]

Notice how much shorter the P6 code is compared to P5. In fact, the divisors subroutine is now so short that we could inline it in the main code:

use v6;

for 2..10000 -> $num {
    my @divisors = 1, | grep { $num %% $_ }, 2 .. (1 + ($num / 2).Int);
    say "$num: ", @divisors if $num == [+] @divisors;
}

and make it even shorter. Well, if we just want the perfect numbers without willing to print the list of divisors, we could even get rid of the @divisors temporary array and make it short enough for a one-liner:

perl6 -e 'for 2..10000 -> $num { $num.say if $num == [+] (1, | grep { $num %% $_ }, 2 .. (1 + ($num / 2).Int))};'

So, the code is much shorter, but I regret to say that both these Perl 6 versions are also more than 10 times slower that the Perl 5 counterpart.

A Better Algorithm to Speed up Things

Of course, a better algorithm solves this. For example, we may reuse the idea of the triangular numbers:

use v6;

sub divisors (Int $num) {
    return 1, | grep { $num %% $_ }, 2 .. (1 + ($num / 2).Int);
}
my $triangular-num = 1;
for 2..200 -> $num {
    $triangular-num += $num;
    my $sum-div = [+] divisors $triangular-num;
    say " $triangular-num : @divisors[]" if $sum-div == $triangular-num;
    last if $triangular-num > 10000;
}
say now -  INIT now;

Now, finding the first four perfect numbers takes about a third of a second:

~ perl6 triangular_perfect.p6
6
28
496
8128
0.3385362

So, perhaps, rather than complaining about Perl 6 being slow as I just did above, I should have started with a better algorithm.

Using Lazy Infinite Sequences

One of the difficulties in the code above is to determine the required range for $num. The math is not too difficult to compute that the upper bound should be close the square root of twice the ceiling for perfect numbers (i.e. about 141 for a ceiling of 10,000), but I used an upper bound of 200 to be on the safe side of things. The best would be not to have to compute that upper bound. Here come to the rescue lazy infinite lists. For example, we can generate an infinite list @nums of consecutive integers for $num, and Perl 6 will compute them as and when until it reaches the limit for $triangular-num:

use v6;

sub divisors (Int $num) {
    return 1, | grep { $num %% $_ }, 2 .. (1 + ($num / 2).Int);
}
my $triangular-num = 1;
my @nums = 2 ... *;   # Infinite sequence
for @nums -> $num {
    $triangular-num += $num;
    my $sum-div = [+] divisors $triangular-num;
    say $triangular-num if $sum-div == $triangular-num;
    last if $triangular-num > 10000;
}
say now -  INIT now;

It is more concise and probably more idiomatic to generate directly an infinite sequence of triangular numbers:

use v6;

sub divisors (Int $num) {
    return 1, | grep { $num %% $_ }, 2 .. (1 + ($num / 2).Int);
}   
my $num = 1;
my @triangular-numbers = 1,  * + ++$num ... *;
for @triangular-numbers -> $triangular-num {
    last if $triangular-num > 10000;
    say $triangular-num if $triangular-num == [+] divisors $triangular-num;
}
say now -  INIT now;

The key code line here is where the @triangular-numbers sequence is defined. It is an infinite sequence using an explicit generator: each value is created by adding the previous value and a number, $num, which is itself incremented at each step through the process.

Rather than building an infinite list, we can build a list of the triangular numbers less than 10,000 (thereby making the last statement in the loop unnecessary) by adding a code block saying where the sequence should stop:

use v6;

sub divisors (Int $num) {
    return 1, | grep { $num %% $_ }, 2 .. (1 + ($num / 2).Int);
}   
my $num = 1;
my @triangular-numbers = 1,  * + ++$num ...^ * > 10000 ;
.say if $_ == [+] divisors $_ for @triangular-numbers;
say now - INIT now;

This program displays the same four perfect numbers:

perl6 triangular_perfect.p6
1
6
28
496
8128
0.2695492

You might object that the requirement was to display the first five perfect numbers, and we haven't done that yet in Perl 6. Yes, indeed. Even if our program now runs pretty fast, we know from the experience in Perl 5 that this won't be fast enough for the fifth perfect number. So, just as for P5, we'll use the Mersenne Primes in P6.

Mersenne Prime Numbers in Perl 6

In our Perl 5 program using Mersenne numbers, we spent quite a bit of energy trying to make an efficient is_prime subroutine.

We don't need that in Perl 6, since it has a built-in function, is-prime, which uses the very fast Miller-Rabin algorithm to find out whether a number is prime. To tell the truth, the Miller-Rabin algorithm is a probabilistic test, which may in theory report that a composite number is prime, but the probability of occurrence of such an event is so abysmally low that we can just dismiss it for practical purposes.

The other advantage of P6 is its built-in ability to do integer calculations with arbitrary precision.

This a Perl 6 program displaying the first 12 perfect numbers in about half a tenth of a second:

use v6;

for grep { is-prime $_ }, 0..300 -> $prime {
    my $mersenne =  2 ** $prime - 1;
    say $mersenne * 2 ** ($prime - 1) if is-prime $mersenne;
}
say "time taken ", now - INIT now;

This displays the following:

~ perl6 mersenne_perfect.p6
6
28
496
8128
33550336
8589869056
137438691328
2305843008139952128
2658455991569831744654692615953842176
191561942608236107294793378084303638130997321548169216
13164036458569648337239753460458722910223472318386943117783728128
14474011154664524427946373126085988481573677491474835889066354349131199152128
time taken: 0.0533907

The last number displayed has 77 digits, and calculating these 12 perfect numbers took less than 0.06 second.

Computing the first 15 perfect numbers took 0.64 second on my old inefficient laptop, and the fifteenth perfect number (built with prime number 1279) has 770 digits.

Centering Text

Write a function, 'center', whose argument is a list of strings, which will be lines of text. The function should insert spaces at the beginning of the lines of text so that if they were printed, the text would be centered, and return the modified lines.

For example,

center("This", "is", "a test of the", "center function");

should return the list:

"     This", "      is", " a test of the", "center function"

because if these lines were printed, they would look like:

       This
        is
   a test of the
  center function

The Center Subroutine in Perl 5

This challenge is fairly simple. This is a possible implementation in Perl 5:

#!/usr/bin/perl
use strict;
use warnings;
use feature qw / say /;

sub center {
    my $max_size = 0;
    for my $str (@_) {
        my $length = length $str;
        $max_size = $length if $length > $max_size;
    }
    map { " " x (int ($max_size - length) / 2) . $_} @_;
}   

my @strings = ("This", "is", "a test of the", "center function");
say for center @strings;

This displays the following output:

$ perl center.pl
     This
      is
 a test of the
center function

We could also use the max function of the core List::Util module to make the center subroutine simpler:

#!/usr/bin/perl
use strict;
use warnings;
use feature qw / say /;
use List::Util qw / max /;

sub center {
    my $max_size =  max map length, @_;
    map { " " x (int ($max_size - length) / 2) . $_} @_;
}

my @strings = ("This", "is", "a test of the", "center function");
say for center @strings;

This script displays the same output as before.

Note that Perl 5 has the format functionality which could be used for centering pieces of text. I have barely used it, perhaps 2 or 3 times in the last 15 years or so, and I believe that it has become somewhat obsolete. Besides, I don't think using it here would really satisfy the requirement (and it would be overkill).

The Center Subroutine in Perl 6

Perl 6 has a built-in max function that we can use to compute the size of the longest string much in the same way as in our second P5 implementation:

use v6;

sub center (@str) {
    my $max-size = max map { .chars }, @str;
    return map { " " x (($max-size - .chars) / 2).Int ~ $_}, @str;
}

my @strings = "This", "is", "a test of the", "center function";
.say for center @strings;

This script displays the following output:

~ perl6 center.pl6
     This
      is
 a test of the
center function

Although I personally prefer the functional style when using map, we could also do it with a chained method-invocation syntax:

sub center (@str) {
    my $max-size = @str.map({.chars}).max;
    @str.map({" " x (($max-size - .chars) / 2).Int ~ $_})
}

my @strings = "This", "is", "a test of the", "center function";
.say for center @strings;

Wrapping up

There was a third challenge this week: Create a simple Perl client for @mailgun API. I'm not sure I even understand the requirement, but I guess this probably has to do with creating a Web interface. Since I know next to nothing about this kind of topic, I won't undertake anything on that subject and even less blog about it

The next week Perl Weekly Challenge is due to start very soon. If you're interested in participating in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 6 p.m. BST (British summer time) on next Sunday, May 26. And, please, also spread the word about the Perl Weekly Challenge if you can.

Perl Weekly Challenge # 7: Niven Numbers and Word Ladders

These are some answers to the Week 7 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Challenge # 1: Niven Numbers

Print all the niven numbers from 0 to 50 inclusive, each on their own line. A niven number is a non-negative number that is divisible by the sum of its digits.

A Niven number or harshad number is a strictly positive integer that can be evenly divided by the sum of its digits. Note that this property depends on the number base in which the number is expressed (the divisibility property is intrinsic to a pair of numbers, but the sum of digits of a given number depends on the number base in which the number is expressed). Here we will consider only numbers in base 10.

Please also note that 0 cannot be a divisor of a number. Therefore, 0 cannot really be a Niven number. We'll start at 1 to avoid problems.

Niven Numbers in Perl 5

For a simple problem like this, I can't resist doing in with a Perl one-liner:

$ perl -E 'for my $num(1..50) { my $sum = 0; $sum += $_ for (split //, $num); say $num if $num % $sum == 0;}'
1
2
3
4
5
6
7
8
9
10
12
18
20
21
24
27
30
36
40
42
45
48
50

If you don't like one-liners, this how a full-fledged Perl 5 script could look like:

use strict;
use warnings;
use feature qw/say/;

for my $num(1..50) { 
    my $sum = 0; 
    for (split //, $num) { 
        $sum += $_; 
    } 
    say $num if $num % $sum == 0;
}

This prints the same as the above one-liner.

Niven Numbers in Perl 6

We can simply translate the P5 one-liner in P6:

$ perl6 -e 'for 1..50 -> $num { my $sum = [+] $num.comb; say $num if $num %% $sum}'

This prints out the same values as before.

Note that the code is more concise that its P5 counterpart, in large part because the use of the [+] hyper-operator makes is possible to compute the sum of the digits in just one expression.

Since you can have both an if and a for statement modifier in the same statement in Perl 6, you can make this one-liner even more concise:

$ perl6 -e '.say if $_ %% [+] $_.comb for 1..50'

And it you prefer a real script, this is one way it could be done:

use v6;
.say for gather { 
    for 1..50 -> $num { 
        my $sum = [+] $num.comb; 
        take $num if $num %% $sum 
    }
}

Challenge # 2: Word Ladders

A word ladder is a sequence of words [w0, w1, …, wn] such that each word wi in the sequence is obtained by changing a single character in the word wi-1. All words in the ladder must be valid English words.

Given two input words and a file that contains an ordered word list, implement a routine that finds the shortest ladder between the two input words. For example, for the words cold and warm, the routine might return:

("cold", "cord", "core", "care", "card", "ward", "warm")

However, there’s a shortest ladder: (“cold”, “cord”, “card”, “ward”, “warm”).

The text of the challenge provides the following additional information:

  • Givens:

    1. All words in the list have the same length.

    2. All words contain only lowercase alphabetical characters.

    3. There are no duplicates in the word list.

    4. The input words aren't empty and aren't equal but they have the same length as any word in the word list.

  • Requirements:

    1. The routine must return a list of the words in the ladder if it exists. Otherwise, it returns an empty list.

    2. If any of the input words is the wrong length (i.e., its length is different to a random from the word list) or isn’t in the word list, return an empty list.

According to this Wikipedia page, word ladder puzzles were invented by the famous writer and mathematician Lewis Carroll in 1877, at a time when there was obviously no computer.

To comply with given # 1, I'll break up my word.txt list into files with 2-letter words, 3-letter words, and so on, although this is really not necessary: it would be trivial to filter out the words with a different letter count when reading the word.txt file.

My word.txt input file only contains words with only lowercase alphabetical ASCII characters.

I'll slightly depart from given # 4 and Requirement # 1: if the input words are equal, I'll simply return that word as being the ladder. And I'll abort the program if the input words have different lengths.

Just as for some previous challenges, I will use a words.txt file containing 113,809 lower-case English words usually accepted for crossword puzzles and other word games. The words.txt file can be found on my Github repository. The original list was contributed to the public domain by Internet activist Grady Ward in the context of the Moby Project. This word list is also mirrored at Project Gutenberg.

For the purpose of testing the programs below, the words.txt file is located in my current directory. Obviously, when we will be reading the list, we will need to keep only the words having the same length as the two input words.

This is task that is much more complicated than the other challenge of this week (and than most previous challenges). In fact, my first reaction when reading the problem was, "Gosh, I've got no idea how I'm going to solve that." In such case, it is often a good idea to try to break up the problem into smaller ones.

The first thing that we must be able to do is to figure out whether one word can be transformed into another with just one letter change. It would probably be also very useful to know whether this can be done with two letter changes, three letter changes, etc. For this, we may want to use a well-known CS string metric named the Levenshtein distance or Levenshtein edit distance, which is the smallest number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. In the case of this challenge, however, we probably don't need to consider insertions and deletions, but are interested only in substitutions.

Once we have a routine to compute the Levenshtein distance, we might try to use brute force with backtracking to test all possibilities, or an optimized version thereof able to remove non optimal paths relatively early in the process, or a branch and bound algorithm, or implement some form of Dijstra's algorithm for shortest paths.

Word Ladders in Perl 5

Computing the Edit Distance In Perl 5

There are a number of CPAN modules to compute the Levenshtein distance on the CPAN, such as Text::Levenshtein and some variants. However, we will not use any of them for several reasons. One is that using a ready-made library for solving such a challenge is kind of cheating; I especially don't wish to use a non-core module. The second reason is that, as we have seen above, the general Levenshtein distance is the smallest number of single-character edits (insertions, deletions or substitutions) required to change one word into the other, where as, since our input words have the same length, we're interested with only the smallest number of substitutions. The last reason is that, because of the second reason just stated, computing the full Levenshtein distance would be overkill: we really need to compute only the number of letter differences for each given position in the input words. Because of that, I'll use the term edit distance, rather than Levenshtein distance.

Once we've clarified in our head what we need, it is quite simple to write our edit_distance subroutine and some code to test it:

use strict;
use warnings;
use feature qw/say/;
sub edit_distance {
    my ($word1, $word2) = @_;
    die "Words $word1 and $word2 have different lengths\n" unless length $word1 == length $word2;
    my @w1 = split //, $word1;
    my @w2 = split //, $word2;
    my $dist = 0;
    for my $i (0..$#w1) {
        $dist++ if $w1[$i] ne $w2[$i];
    }
    return $dist;
}
for my $word_pair_ref (["cold", "cord"], ["cord", "core"], ["cord", "cord"], 
        ["cold", "warm"], ["kitten", "sittin"], ["kitten", "sitting"]) {
    my ($w1, $w2) = @$word_pair_ref;
    say "Distance between $w1 and $w2 is: \t", edit_distance ($w1, $w2);
}

This script displays the expected output:

Distance between cold and cord is:      1
Distance between cord and core is:      1
Distance between cord and cord is:      0
Distance between cold and warm is:      4
Distance between kitten and sittin is:  2
Words kitten and sitting have different lengths

So, we have a working edit_distance subroutine, a ladder subroutine might look in part like this:

sub ladder {
    my $word1, $word2) = @_;
    return ($word1) if edit_distance($word1, $word2) == 0;
    return ($word1, $word2) if edit_distance($word1, $word2) == 1;
    # ...
}

The problem, of course, is to write the code that will replace the ellipsis. Eventually, the real code will not really look like that, but it could.

Finding the Ladders in Perl 5

Let's try an improved brute force algorithm. For this, we will first analyze our input file and prepare data structures that are likely to make brute force relatively fast.

Preparing the Data

To start with, we will use our words.txt file to create files of words having 2, 3, 4, ... 9, 10 letters. This can be done with simple one-liners such as:

$ perl -nE 'chomp; say if length == 10' words.txt >  words10.txt

We're not really interested with longer words, because it will become increasingly difficult and unlikely to find complete ladders between such longer words.

Our 9 files have the following word counts:

113809  113809 1016714 words.txt
9199     9199  101189 words10.txt
  85       85     255 words2.txt
 908      908    3632 words3.txt
3686     3686   18430 words4.txt
8258     8258   49548 words5.txt
14374   14374  100618 words6.txt
21727   21727  173816 words7.txt
26447   26447  238023 words8.txt
16658   16658  166580 words9.txt
220447  220447 1932357 total

The next thing we want to do is to build hashes which, for each word in the input file, lists all words that are a single-letter-change away from that given word. This implies nested loops on large data collections; this can take a very long time to run, so we want to cache the result in order to avoid having to recompute it every time. That's what the first 40 code or so lines in the program below do. We use the Storable core module to store the generated hash in a file. Note that it would be very easy to store our %words hash of arrays in a plain text file and to retrieve it on demand (for example, one line per hash key, with the key and the list of words on eash such line), and that's what we'll do below in P6, but Storable does it well and fast for us. The basic idea is as follows: we get two words as parameters to the program; if the words have the same length, say 5, we look for a data store file named word_store_5 on the disk. If the file is there, we just retrieve the data and load the hash from it (which takes a split second); if not, we generate the hash and store it. This way, these long computations can be done only once for each word length.

The %words hash of arrays is in effect a graph of the single-letter-edit connections between words.

Some words are what I call "orphans" in the code, i.e. there are not connected to any other word through a single character edit. When legendary computer scientist Donald Knuth studied the problem several decades ago, he found that the word "aloof" was one such word. So he named "aloof" all of these words not connected to any other. In the code below, these words are stored in "aloof" files, despite the fact that my word list has in fact the word "kloof" (whatever it means) which is one character edit away from "aloof." We remove these words from our hash, since they cannot participate in any path between two words.

The table below summarizes word counts and run times for this part of the process.

| File        | Words   | Run time   | Aloof words |
| ----------- | ------- | ---------- | ----------- |
| words2.txt  |  85     | 0m0.102s   | 0           |
| words3.txt  |  908    | 0m2.519s   | 6           |
| words4.txt  |  3686   | 0m38.162s  | 68          |
| words5.txt  |  8258   | 4m46.641s  | 711         |
| words6.txt  |  14374  | 16m29.848s | 3093        |
| words7.txt  |  21727  | 39m5.278s  | 7348        |
| words8.txt  |  26447  | 71m47.795s | 12516       |
| words9.txt  |  16658  | 28m6.781s  | 10096       |
| words10.txt |  8258   | 7m51.305s  | 6494        |

When words have 9 characters or more, significantly more than half of them are "aloof" words. It probably becomes relatively difficult to find pairs of words that are connected with a ladder.

As you can see, this process takes quite a lot of time when there are many words in the input file (more than 71 minutes for words having 8 characters), it is good to store the data produced to avoid recomputing it. Note that is might be quicker not to do that and to go directly for the ladders if we were to determine a path between only two words, but I ran the program probably a couple of hundred times for the purpose of testing and finding information about our word lists, so I'm happy that I first took the time to prepare the data for faster processing later. When the word store already exists, it takes about 0.1 to 0.3 second to reconstruct the hash from the store.

Finding the Word Ladders

This is my Perl 5 code for finding word ladders:

use strict;
use warnings;
use feature qw/say/;
use Storable;
use Data::Dumper;

die "Please pass two words as parameters" unless @ARGV == 2;
my ($word1, $word2)= @ARGV;
my $length = length $word1;
die "The two words must have the same length\n" if $length != length $word2;

my $store_file = "word_store_$length";
my ($store_ref, %words);
if (-e $store_file) {
    my $store_ref = retrieve($store_file);
    %words = %$store_ref;
} else {
    my $file = "words$length.txt";
    open my $IN, '<', $file or die "Cannot open $file$!";
    while (my $word = <$IN>) {
        chomp $word;
        $words{$word} = [];
        for my $key (keys %words) {
            if (edit_distance($key, $word) == 1) {
                push @{$words{$key}}, $word;
                push @{$words{$word}}, $key;
            }
        }
    }
    close $IN;
    my $orphans = "aloof_$length.txt";
    open my $OUT, ">", $orphans or die "Cannot open file $orphans$!";
    for my $key (keys %words){
        if (scalar @{$words{$key}} == 0) {
            say $OUT "$key";
            delete $words{$key}; 
        }
    }
    close $OUT;
    store \%words, $store_file;  
}

my $max = $le   ngth * 2;

sub edit_distance {
    my ($word1, $word2) = @_;
    # die "Words $word1 and $word2 ..." -> No longer needed as this is checked before
    my @w1 = split //, $word1;
    my @w2 = split //, $word2;
    my $dist = 0;
    for my $i (0..$#w1) {
        $dist++ if $w1[$i] ne $w2[$i];
    }
    return $dist;
}

sub ladder {
    my ($word1, $word2, $tmp_result) = @_;
    return $tmp_result if $word1 eq $word2;
    return [] if scalar @$tmp_result >= $max;
    my @temp_solutions;
    for my $word (@{$words{$word1}}) {
        next if $word eq $word1;
        next if grep { $_ eq $word } @$tmp_result; # not really needed but a bit faster
        push @temp_solutions, [@$tmp_result, $word] and last if $word eq $word2;
        my $new_tmp = ladder($word, $word2, [@$tmp_result, $word]);
        next if scalar @$new_tmp == scalar @$tmp_result;
        next unless scalar @$new_tmp;
        push @temp_solutions, $new_tmp;
    }
    return [] unless @temp_solutions;
    my $best_sol = (sort { scalar @$a <=> scalar @$b } @temp_solutions)[0];
    $max = scalar @$best_sol if scalar @$best_sol < $max;
    return $best_sol;
}

for ($word1, $word2) {
    die "Word $_ not found\n" unless exists $words{$_};
}
my $ladder = ladder $word1, $word2, [$word1];

if (@$ladder) {
    say join "->", @$ladder;
} else {
    say "No ladder found for $word1 and $word2"
}

The bulk of the work is done in the ladder subroutine, which calls itself recursively for all words connected to the $word1 input word. The $max variable, which controls how deep we go into the recursive research, may not be needed for the correctness of the algorithm (provided we find another way to stop recursion), but it enables the program to run tens to hundreds of times faster, depending on the input words.

This is an example run:

$ time perl ladder2.pl warm cold
warm->ward->card->cord->cold

real    0m2.959s
user    0m2.859s
sys     0m0.077s

Printing out intermediate results shows that there are other shortest ladders between these two words, for example:

warm worm word wold cold
warm worm word cord cold
warm worm corm cord cold
warm ward card cord cold

There is one caveat: it should be noted that I have initialized the $max variable to twice the length of the input words. In a way, this is a bug because there are certainly some (probably very rare) pairs of four-letter words for which the shortest ladder contains more than 8 words, but I have kept it this way as a trade-off because the program runs so dramatically faster with a relatively low value for $max. If we needed to make sure that we don't miss any shortest ladder, we could run it the way it is now (so that it is fairly fast most of the time) and, when no ladder is found, run it again with a much larger initial value of $max. According to this site, the longest shortest ladder (with words of six letters) has 50 steps.

As a conclusion to the Perl 5 implementation of word ladders, I should say that, although it seems to work properly (subject to the caveat just above), I'm not really fully satisfied with this solution: I think it should be possible to make something simpler (and perhaps faster), but I don't have the time at the moment to investigate further.

Word Ladders in Perl 6

Let's try to adapt the P5 script to P6.

Serializing the Word Hash of Arrays

There doesn't seem to be a Storable module in Perl 6, but I did not worry about that, since I thought that the gist routine would provide a serialized image of the %words hash which could then be EVALed to retrieve the hash. This seems to work fine with a small hash. But that does not work with our big %words hash of arrays, because it turns out that, apparently, gist truncates its output to a few thousands characters.

It would probably be possible to serialize the %words hash with some JSON or YAML module, but there doesn't seem to be any core module for that.

As noted in the P5 section of this challenge, there is nothing complicated in writing our own plain text serializer for a simple hash of arrays. For example, we can write a plain text file with one line for each hash item, with the key at the beginning of the line and then a list of the values. Let's do it in a little toy module, which could look like something like this:

unit package Store;

sub store (%hash, $file) is export {
    # stores a hash or array as lines containing key et values
    # such as: key | val1 val2 val3
    my $out;
    for %hash.kv -> $key, $val {
        $out ~= "$key | $val \n";
    }
    spurt $file, $out;
}
sub retrieve (%hash, $file) is export {
    # populates a hash of arrays with stored data
    for $file.IO.lines -> $line {
        my ($key, $val) = split /\s?\|\s?/, $line;
        %hash{$key} = $val.words;
    }
}

As an example, the first few lines of the word_store_4 file look like this:

yawl | bawl pawl wawl yawn yawp yaws yowl 
pled | fled bled pied plea peed gled pleb plod sled 
pita | dita pima pica pika pina pith pits pity vita 
keir | heir weir 
quag | quad quai quay 
frug | frig frog frag drug

The Ladder Script in Perl 6

Now that we have solved the problem of storing and retrieving the hash of arrays, adapting the P5 script into Perl 6 is fairly easy:

use v6;
use Store;

die "Please pass two words as parameters" unless @*ARGS == 2;
my ($word1, $word2)= @*ARGS;
my $length = $word1.chars;
die "The two words must have the same length\n" if $length != $word2.chars;

my $max = 2 * $length;
my $store-file = "word_store_$length";
my ($stored, %words);
if ($store-file.IO.e) {
    retrieve %words, $store-file;
} else {
    for "words$length.txt".IO.lines -> $word { 
        %words{$word} = [];
        for keys %words -> $key {
            if (edit-distance($key, $word) == 1) {
                push @(%words{$key}), $word;
                push @(%words{$word}), $key;
            }
        }
    }
    %words = grep { $_.value.elems > 0 }, %words.pairs; 
    store %words, $store-file;
}

sub edit-distance (Str $word1, Str $word2) {
    my @w1 = $word1.comb;
    my @w2 = $word2.comb;
    my $dist = 0;
    $dist++ if @w1[$_] ne @w2[$_] for (0..@w1.end) ;
    return $dist;
}

sub ladder (Str $word1, Str $word2, $tmp-result) {
    return $tmp-result if ($word1 eq $word2);
    return [] if @$tmp-result.elems >= $max;
    my @temp-solutions;
    for @(%words{$word1}) -> $word {
        next if $word eq $word1;
        next if grep { $_ eq $word }, @$tmp-result;
        push @temp-solutions, [|@$tmp-result, $word] and last if $word eq $word2;
        my $new_tmp = ladder($word, $word2, [|@$tmp-result, $word]);
        next if @$new_tmp.elems == @$tmp-result.elems;
        next unless @$new_tmp.elems;
        push @temp-solutions, $new_tmp;
    }
    return [] if @temp-solutions.elems == 0;
    my $best_sol = (sort { $_.elems }, @temp-solutions)[0];
    $max = @$best_sol.elems if @$best_sol.elems < $max;
    return $best_sol;
}

for ($word1, $word2) {
    die "Word $_ not found\n" unless  %words{$_} :exists;
}
my $ladder = ladder $word1, $word2, [$word1];

if (@$ladder) {
    say join "->", @$ladder;
} else {
    say "No ladder found for $word1 and $word2"
}

Running the script with the words "warm" and "cold" produces the following output:

$ perl6  ladder.p6 warm cold
warm->worm->word->cord->cold

$ perl6  ladder.p6 cold warm
cold->wold->word->worm->warm

Wrapping up

The next week Perl Weekly Challenge is due to start very soon. If you're interested in participating to this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 6 p.m. BST (British summer time) on next Sunday, May 19. And, please, also spread the word about the Perl Weekly Challenge if you can.

Perl Weekly Challenge # 6: Ramanujan's Constant

These are some answers to Challenge 2 of the Week 6 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Challenge 1 (compact numeric ranges) was covered in this post.

Challenge 2 was as follows:

Create a script to calculate Ramanujan’s constant with at least 32 digits of precision. Find out more about it here (Wikipedia link).

The Wikipedia link provided in the question concerning this second challenge was apparently changed some time after the challenge was initially posted.

The original link, posted on Monday, April 29, 2019, was pointing to the Landau-Ramanujan Constant, which relates to the sum of two squares theorem.

Then, two days later, on May 1, 2019, I noticed that the link had changed and pointed towards this other Wikipedia page about Ramanujan's constant, which refers to irrational (well, in this case, actually transcendental) numbers that almost look like integers. I do not know when exactly the Wikipedia link was changed.

I guess that my good friend Mohammad Anwar got carried away when writing the challenge because it related to one of his most famous fellow citizens, Indian mathematician Srinivasa Ramanujan (1887-1920). If you've never heard about Ramanujan or don't know much about him, please visit the Wikipedia article just mentioned and search further on the Internet; he is, despite limited access to other mathematicians of the time for a large part of his very short life, one of the greatest mathematicians of the early twentieth century.

I worked on the original link (and the Landau-Ramanujan Constant) for a couple of days, only to find out on May 1, 2019 that the question is now different.

The good thing in this apparent mistake is that I've got two challenges for the price of one. Thank you, dear Mohammad, for the bonus.

Original Challenge (Landau-Ramanujan Constant)

The Landau-Ramanujan Constant is the only constant that was traditionally associated with the name of Ramanujan, at least until relatively recently.

I'll not explain what this constant is, as the best I would be able to do would be to quote or paraphrase the Wikipedia page on the subject.

The one thing missing in that page though (at least in its English version) is the formula of the Euler product (i.e. an infinite product using prime numbers) to compute the value of this constant. Strangely, the French Wikipedia page on the same subject, which admits being essentially a translation of the English page, has the following formula:

Ramanujan_1.gif

It is an infinite product for values of p that are prime numbers congruent to 3 mod 4 (i.e. prime numbers in the form 4 * k + 3, with k an integer). The first numbers satisfying these properties are: 3, 7, 11, 19, 23, 31, 43, 47, ...

The first digits of the Landau-Ramanujan constant found on this Internet page are:

K = 0.7642236535892206629906987312500923281167905413934...

A Perl 5 Implementation of the Landau-Ramanujan Constant Calculation

Implementing this formula first requires to build a (long) list of prime numbers and then to keep only those that are congruent to 3 mod 4. Here, I'll simply reuse the find_primes subroutine which I wrote for the challenge on 5-smooth numbers in Perl Weekly Challenge # 3. Please refer to that blog post if you want explanations on how it works.

Once we know how to generate a long list of primes, computing the formula above is not too complicated:

use strict;
use warnings;
use feature 'say';
use constant largest_num => 100_000;

sub find_primes {
    my $num = 5;
    my @primes = (2, 3, 5);
    while (1) {
        $num += 2;     # check only odd numbers
        last if $num > largest_num;
        my $limit = int $num ** 0.5;
        my $num_is_prime = 1;
        for my $prime (@primes) {
            last if $prime > $limit;
            if ($num % $prime == 0) {
                $num_is_prime = 0;
                last;
            }
        }
        push @primes, $num if $num_is_prime; 
    }
    return @primes;
}
my @prime_numbers = find_primes;
my @primes_for_rama = grep { $_ % 4 == 3 } @prime_numbers;

my $product = 1;
for my $p (@primes_for_rama) {
    my $term = (1 - (1 / $p**2)) **(-1/2);
    $product *= $term;
}
say $product / 2**(1/2);

Here, we compute a list (@prime_numbers) of all prime numbers below 100,000 and then use a grep to keep in @primes_for_rama those that are congruent to 3 modulo 4. Finally, we iterate over this array to compute the Euler product.

With primes up to 100,000, we obtain the following result:

$ time perl rama.pl
0.764223499932459

real    0m0.137s
user    0m0.078s
sys     0m0.015s

The first six digits after the decimal point (0.764223) are correct, and the calculation is pretty fast (less than 0.14 sec). If we want more accurate digits, we need a larger list of prime numbers. Let's see how it scales.

| ------------ | -------------- | -------------- |
| Primes up to | Correct digits | Run time (sec) |
| ------------ | -------------- | -------------- |
| 100,000      | 0.764223       | 0,137          |
| 1,000,000    | 0.7642236      | 1.109          |
| 10,000,000   | 0.76422365     | 20.237         |
| 100,000,000  | 0.7642236535   | 433.43         |
| ------------ | -------------- | -------------- |

With a ceiling of 100 million for the primes, we're getting 2,880,950 eligible primes and as many terms to multiply, and the run time increases to more that 7 minutes.

We can see in the table above that we have here a perfect example of exponential explosion: each time we multiply the ceiling of our array of prime numbers by 10, we only get roughly one more digit of accuracy on the result. The infinite product is converging extremely slowly, and it gets slower and slower. There is just no way we'll be able to obtain 32 digits, as requested in the original challenge (at least not with this approach).

It is actually worse than that. Since the calculations are performed in floating-point arithmetic, numbers get rounded and, at some point, the value obtained no longer changes. For example, the value of the constant obtained for primes until 100,000,000 is 0.76422365350875 (with the four last digits wrong) and took more than 7 minutes to compute. With primes up to 500,000,000, run time exceeded 74 minutes, and the value obtained (including the four wrong digits at the end) was just exactly the same.

Using the Math::BigRat or Math::BigFloat core modules would certainly make it possible to solve the latter problem, but they would make the first problem even worse, because big numbers arithmetic is generally much slower than normal floating-point arithmetic. There are some ways to alleviate this problem to some extent, but certainly not to the point of getting anywhere near 32 digits in a reasonable amount of time.

I'll leave it at that: I was able to compute relatively easily the first ten digits of the Landau-Ramanujan constant, but computing 32 digits seems to be out of reach. Or, at least, it involves complexities beyond the scope of such a challenge.

A Perl 6 Implementation of the Landau-Ramanujan Constant Calculation

Perl 6 will make our code much simpler, but we will most probably hit the same limitations as the P5 solution.

For the P6 implementation, I'll use a slightly different version of the Euler product for the Landau-Ramanujan constant, which I found on the Wayback Machine:

Ramanujan_2.gif

The reason for choosing this variant of the formula is that it might make it possible to make most of the calculations with simple arithmetic operations on Rats and to postpone the determination of the square root until the last calculation.

use v6;

my @primes = grep { .is-prime }, map { 4 * $_ + 3},  0..1_000_000; 
my @terms =  map { 1 / (1 - (1/($_ * $_)) ) }, @primes;
my $product = ([*] @terms) /2; 
say $product ** (1/2);

The results are essentially the same as with the P5 implementation (0.76422365, i.e. 8 correct digits with an input range of 0..1_000_000 corresponding to 141,643 eligible primes and terms in the product), and 10 correct digits when going for a ten-times larger range.

But it is much slower than the P5 solution. One reason is that a lot of the calculations are done with RAT type instead of floats, which means computing things with a much greater accuracy, even if, in that specific case, that greater accuracy doesn't bring any obvious benefit. Another (more important) reason is that, although the ìs-prime built-in method is remarkably fast, it still takes a lot of time to execute when you run it a million times. And, I hate to say, Perl 6 is still significantly slower than Perl 5 for such CPU intensive computations.

Updated Challenge (Ramanujan's Constant)

What has become known as the Ramanujan Constant in the recent period is a number that is an "almost integer" and has in fact little to do with mathematician Srinivasa Ramanujan.

This number is the following one:

Ramanujan_3.gif

As you can see, there are nine 9 digits after the decimal point, so that this number, which is built from a formula involving exponentials on one algebraic and two transcendental numbers, almost looks like an integer (when rounded to less than 9 digits after the decimal point).

The number in question had been discovered by mathematician Charles Hermitte in 1859, more than 25 years before Ramanujan’s birth.

The reason why it has become known as Ramanujan’s constant is that, in 1975, "recreational mathematics" columnist Martin Gardner published in Scientific American an April fool article where he claimed that said number, calculated from algebraic and transcendental numbers, was in fact an integer, and further claimed that Ramanujan has already discovered that in the early twentieth century. This was just a joke, as this number is transcendental, but is an impressively close approximation of an integer. At the time, computers were not able to compute this number with enough accuracy to disproof Gardner's assertion. Following that, people have started to call this number Ramanujan’s constant (Ramanujan worked on a number of similar numbers and probably also on this one, but there is no evidence that he discovered anything significantly new on that specific number).

Perl 5 Solution: Using Big Number Modules

Using regular floating-point arithmetic obviously does not work because it does not provide sufficient accuracy:

$ perl -E '$pi = 3.14159265358979323846; say exp ($pi * sqrt 163);'
2.62537412640768e+17

I've been reluctant so far to use external Perl modules for the answers to Perl Weekly Challenge, because I considered that using a module to solve the challenge was sort of cheating a bit. Here, there are two arguments that make me think differently: first, I'm going to use core modules (i.e. modules that are part of the core Perl system), and I am going to use modules that are not giving you a ready-made algorithm, but just provide you with sufficient accuracy.

We can use the Math::BigFloat core module.

use strict;
use warnings;
use Math::BigFloat "bpi"; 
my $sqrt_163 = Math::BigFloat->new(163)->bsqrt;
my $big_e =  Math::BigFloat->new(1)->bexp;
printf "%.33s\n", $big_e ** (bpi() * $sqrt_163);

which duly prints Ramanujan's constant with an accuracy of 32 significant digits:

262537412640768743.99999999999925

Here I wanted to use the usual arithmetic operators (which are overloaded for BigFloat objects), because I thought that it would make the formula more readable, but I have to admit the syntax looks a bit cumbersome. Using chained method invocations makes the code more concise (you should read the code implementing the math formula from right to left):

use strict;
use warnings;
use feature "say";
use Math::BigFloat "bpi";
say Math::BigFloat->new(163)->bsqrt->bmul(bpi)->bexp(32);

This outputs Ramanujan's constant just as before:

262537412640768743.99999999999925

However, I'm not a great fan of chained method invocations. Using the bigrat core module, which is just a wrapper around other big number libraries, allows a function call syntax and makes the code more concise, so that we can satisfy the requirement with a simple Perl one-liner:

$ perl -Mbigrat=PI,bexp -E 'say bexp(PI * sqrt(163), 32);'
262537412640768743.99999999999925

Note that if we reduce the accuracy to 29 digits (instead of 32):

$ perl -Mbigrat=PI,bexp -E 'say bexp(PI * sqrt(163), 29);'
262537412640768744.00000000000

we obtain a good visual illustration of the reason why Ramanujan's constant is sometimes called an "almost integer".

Perl 6 Solution Using the Builtin FatRat Type

The Wikipedia page on Ramanujan's constant and the formula given at the beginning of the Update Challenge section of this post show that the integer part of this constant is equal to 640_320 ** 3 + 744 (i.e. 262537412640768744). The Wikipedia article further explains that the difference between this number and Ramanujan's constant is given by:

Ramanujan_4.gif

So we just need to apply this formula. Let's do it under the Rakudo REPL:

> my $a = 640_320 ** 3 + 744; # $a is the integer approximation of R's constant
262537412640768744
> my $r-constant = $a - 196844 / $a;
262537412640768743.999999999999250225
> say $r-constant.fmt("%.33s");
262537412640768743.99999999999925

Note that we are a bit lucky: the value obtained for $r-constant has an accuracy of 33 digits, and we only need 32. Using the FatRat type (instead of the implicit Rat type used above) does not improve accuracy, it is the math formula that is an approximation of Ramanujan’s constant.

Wrapping up

The next week Perl Weekly Challenge is due to start very soon. If you're interested in participating to this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 6 p.m. BST (British summer time) on next Sunday, May 12. And, please, also spread the word about the Perl Weekly Challenge if you can.

Perl Weekly Challenge # 6: Compact Number Ranges

These are some answers to the Week 6 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler alert: this post reveals breaking details about the plot of forthcoming episodes of Game of Thrones. Oops, no, that's not what I meant to say. As of this writing, the Perl Weekly Challenge # 6 is still going until Sunday May 10, 2019, please don't read on if you intend to solve the challenge by yourself.

The Wikipedia link provided in the question concerning the second challenge (Ramanujan's Constant) was apparently changed some time after the challenge was initially posted. I worked on the original link (and the Landau-Ramanujan Constant), only to find out a couple of days later that the question is now different. Because of that, I'll hopefully cover the second challenge in a later post (and will try to provide at least partial answers for both constants associated with the name of Srinivasa Ramanujan).

Anyway, given the situation, this blog post will cover only the first proposed challenge, which relates to compact numeric ranges.

Create a script which takes a list of numbers from command line and print the same in the compact form. For example, if you pass “1,2,3,4,9,10,14,15,16” then it should print the compact form like “1-4,9,10,14-16”.

In the input example provided with the question, the numbers are sorted in ascending order, but there is no reason to limit ourselves to such a case. The question is then the following: if the input is not in ascending order, are we supposed to keep the order provided and compact the sub-ranges, or are we supposed to reorder the numbers and compact the ranges over the sorted data? In other words, if the input data is "1,2,3,4,9,10,14,15,16,5,6,7", are we supposed to display:

1-4,9,10,14-16,5-7

or:

1-7,9,10,14-16?

I chose the first interpretation, but, as we will see, there wouldn't be much code to change to follow the second interpretation.

I will suppose that the input data is correct (i.e. a list of space-separated numbers) and not try to validate the input.

Perl 5 Compact Ranges

So this is my initial attempt:

use strict;
use warnings;
use feature 'say';

my @input = @ARGV > 0 ? @ARGV : (1,2,3,4,9,10,14,15,16,3,4,5,6,4,5,6,7,9,9);
my $prev = my $start = shift @input;
for my $num (@input) {
    if ( $prev == $num - 1 ) {
            $prev = $num;
    } else {
        print $prev == $start ? "$prev," : "$start-$prev,";
        $start = $prev = $num;
    }
}
say $prev == $start ? $prev : "$start-$prev";

Nothing complicated. Just note that if the user doesn't pass any parameter to the script, I have provided a default list of numbers. This is only to make my tests easier.

This works as follows:

$  perl num_ranges.pl 1 2 3 4 9 10 14 15 16 3 4 5 6 9
1-4,9-10,14-16,3-6,9

So, this seems to work properly. If I wanted to use the second interpretation, I would only need to change the for loop statement and the next line as follows:

for my $num (sort { $a <=> $b } @input) {
    next if $num == $prev;
    if ( $prev == $num - 1 ) {
            $prev = $num;
    } else {
        print $prev == $start ? "$prev," : "$start-$prev,";
        $start = $prev = $num;
    }
}

This modified version displays the following output:

$  perl num_ranges.pl 1 2 3 4 9 10 14 15 16 3 4 5 6 9
1-6,9-10,14-16

But there is a slight problem with what we've done so far. When there are only two consecutive numbers (such as 9,10 in the examples above), the required output is "9,10" and not "9-10". Thus, the above solution is not completely correct.

Fixing the Initial Perl 5 solution

Let's try to fully satisfy the requirement.

This means that the conditional in both print statements becomes more complicated, as we now need three-way comparisons. In addition, I wasn't entirely satisfied that this comparison is repeated at two different places: I don't like repeating code when I can avoid it. So I changed the code to move these comparisons out of the main code into a compare subroutine.

use strict;
use warnings;
use feature 'say';

sub compare {
    my ($prev, $start) = @_;
    if ($prev > $start + 1) {
        return "$start-$prev";
    } elsif  ($prev > $start) {
        return "$start,$prev";
    } else {
        return "$prev";
    }
}

my @input = @ARGV > 0 ? @ARGV : (1,2,3,4,9,10,14,15,16,3,4,5,6,4,5,6,7,9,9);
my $prev_val = my $start_val = shift @input;
my $output = "";   
for my $num (@input) {
    if ($num != $prev_val + 1) {
        $output .= compare ($prev_val, $start_val) . ",";
        $start_val = $num;
    }
    $prev_val = $num;
}
$output .= compare ($prev_val, $start_val);
say $output;

Note that I also decided to build progressively an $output variable, rather than printing the result piece by piece.

The result is similar to what we had before, but this time with commas between the 9 and 10:

$  perl num_ranges.pl 1 2 3 4 9 10 14 15 16 3 4 5 6 9
1-4,9,10,14-16,3-6,9

Now that it has become more complicated, I have the feeling that the code gets a bit large and clumsy for such a simple requirement.

We can make the compare subroutine a bit more concise:

sub compare {
    my ($prev, $start) = @_;
    return $prev > $start + 1 ? "$start-$prev" 
        : $prev > $start     ? "$start,$prev"
        : "$prev";
}

This is slightly better, but still feels somewhat clumsy.

A Recursive Perl 5 Approach

Let's see if a recursive approach is better. This could look like this:

use strict;
use warnings;
use feature 'say';

my @input = @ARGV > 0 ? @ARGV : (1,2,3,4,9,10,14,15,16,3,4,5,6,4,5,6,7,9,9);

sub process_input {
    my ($range, $input, $output) = @_;
    my $curr_val = shift @$input;
    if ($curr_val == $range->[1] + 1) {
        $range->[1] = $curr_val;
    } else {
        my $sep = $range->[1] > $range->[0] + 1 ? "-" : ",";
        $output .= (join $sep, @$range) . ",";
        $range = [$curr_val, $curr_val];
    }
    return $output if @$input == 0;
    process_input ($range, $input, $output);
}

my $first = shift @input;
my $output = process_input([($first) x 2], \@input, ""); 
chop $output;
say $output;

The process_input subroutine takes three parameters: * The $range is an array ref containing the first element of a consecutive sequence and the last seen element; * $input is a ref to the unprocessed input; and * output is the string where we build up the result.

This is slightly more concise, but only by a thin margin (and it is probably a bit less easy to understand).

I have also been thinking about trying a functional programming approach, some form of a "lispy" data flow or data pipeline solution using chained map and grep routines, but wasn't able to come up with something that would truly look better. Well, I can't really think of some more elegant way to solve the problem in Perl 5. I look forward to seeing what other participants to the challenge have done.

Perl 6 Compact Ranges

Initially, I did not see any way of doing this that would be very different from the P5 version.

This is a Perl 6 adaptation of the corrected P5 version:

use v6;
sub compare ($prev, $start) {
    return $prev > $start + 1 ?? "$start-$prev" 
        !! $prev > $start     ?? "$start,$prev"
        !! "$prev";
}

my @input = @*ARGS.elems > 0 ?? |@*ARGS !! (1,2,3,4,9,10,14,15,16,3,4,5,6,4,5,6,7,9,9);
my $prev_val = my $start_val = shift @input;

my $output = "";
for @input -> $num {
    if ($num != $prev_val + 1) {
        $output ~= compare($prev_val, $start_val) ~ ",";
        $start_val = $num;
    }
    $prev_val = $num;
}
$output ~= compare $prev_val, $start_val;
say $output;

With no argument passed to the script, it uses the default array and prints out:

$ perl6 num_range.p6
1-4,9,10,14-16,3-6,4-7,9,9

Apart from the slight syntax adjustment between P5 and P6, there is not much difference.

Note that I also thought about using given ... when statements in the compare subroutine, but I ended up feeling that it made the code longer and less expressive for this kind of case. The given ... when "switch" statement is nice when implicitly using the smart match operator, for example to check the topic against a value, a regex or a type, but using it in a purely procedural fashion at it would be the case here gave me the unpleasant impression of going back in time by several decades and writing Pascal or Ada code in Perl 6. It's probably just a personal bias.

The recursive approach in P6 would essentially look like the P5 recursive version, except for the fact that using dynamic scope variables might simplify the passing of arguments between successive recursive calls. Not very interesting.

Compact Ranges in Perl 6: a Functional Approach

Then I thought again about one of my pet subjects: why not try a functional programming approach? Perl 6 provides the gather ... take control flow statement, which can be thought as a generalized version of map, grep, and return: it is sort of a map in which you can also filter items (as in a grep), or of a grep in which you can also map various elements to something else. And you can return the data to the gather statement at the point you choose.

Please note that there were two bugs in the original version that I presented here. Many thanks to Alexander who pointed out to them.

Here we go:

use v6;
sub get($start, $prev) {
    take $prev > $start + 1 ?? "$start-$prev" 
        !! $prev > $start     ?? "$start,$prev"
        !! "$prev";
}
my @input = @*ARGS.elems > 0 ?? |@*ARGS !! (1,2,3,4,9,10,14,15,16,3,4,5,6,4,5,6,7,9,10,11);
my $prev = my $start = shift @input;
my @result = gather {
    for @input -> $num {
        if $num != $prev + 1 {
            get $start, $prev;
            $start = $num;
        }
    $prev = $num;
    }
    get $start, $prev;
}
say @result.join(",");

That may not be perfect, but I'm much more satisfied with this than with all previous versions, be it P5 or P6.

The Perl Weekly Challenge # 6 is still going on, you're welcome to participate before May 5, 2019, 6 p.m., UK time.

About laurent_r

user-pic I am the author of the "Think Perl 6" book (O'Reilly, 2017) and I blog about the Perl 5 and Raku programming languages.