August 2019 Archives

Perl Weekly Challenge # 23: Difference Series and Prime Factorization

These are some answers to the Week 23 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in several days from now (September 1, 2019). This blog post offers some solutions to this challenge, please don't read on if you intend to complete the challenge on your own.

Challenge # 1: nth Order Difference Series

Create a script that prints nth order forward difference series. You should be a able to pass the list of numbers and order number as command line parameters. Let me show you with an example.

Suppose we have list (X) of numbers: 5, 9, 2, 8, 1, 6 and we would like to create 1st order forward difference series (Y). So using the formula Y(i) = X(i+1) - X(i), we get the following numbers: (9-5), (2-9), (8-2), (1-8), (6-1). In short, the final series would be: 4, -7, 6, -7, 5. If you noticed, it has one less number than the original series. Similarly you can carry on 2nd order forward difference series like: (-7-4), (6+7), (-7-6), (5+7) => -11, 13, -13, 12.

nth Order Difference Series in Perl 5

For this, we will write a simple fwd_diff subroutine to compute the first order difference series of the list of values passed as arguments to it. We do that with map on the indexes of the arguments list (starting at 1).

Then, we use a for loop to call this subroutine the number of times required by the first parameter (the order) passed to the script. Note that if the order is larger than the count of the other items passed to the script, then we cannot compute the result.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

sub fwd_diff {
    return map $_[$_] - $_[$_ - 1], 1..$#_;
}

my ($order, @values) = @ARGV;
my $count = scalar @values;
if ($count <= $order) {
    die "Can't calculate ${order}th series of $count values";
}
my @result = @values;
for (1..$order) {
    @result = fwd_diff @result;
}
say "${order} forward diff of @values is: @result";

Testing with 6 values the forward difference series with orders 1 to 6 displays the following output:

$ perl  fwd_diff.pl 1 5 9 2 8 1 6
1th forward diff of 5 9 2 8 1 6 is: 4 -7 6 -7 5

$ perl  fwd_diff.pl 2 5 9 2 8 1 6
2th forward diff of 5 9 2 8 1 6 is: -11 13 -13 12

$ perl  fwd_diff.pl 3 5 9 2 8 1 6
3th forward diff of 5 9 2 8 1 6 is: 24 -26 25

$ perl  fwd_diff.pl 4 5 9 2 8 1 6
4th forward diff of 5 9 2 8 1 6 is: -50 51

$ perl  fwd_diff.pl 5 5 9 2 8 1 6
5th forward diff of 5 9 2 8 1 6 is: 101

$ perl  fwd_diff.pl 6 5 9 2 8 1 6
Can't calculate 6th series of 6 values at fwd_diff.pl line 13.

nth Order Difference Series in Perl 6

I would have liked to be able to use a pointy block syntax with two parameters, but that does not work because the loop will consume two values at each step, as shown under the REPL:

> for <5 9 2 8 1 6> -> $a, $b {say $b - $a}
4
6
5

So we would need to pre-process the input data in order to get twice all values except those at both ends of the input list.

We'll use the rotor built-in function

These are two examples using rotor under the REPL:

> <5 9 2 8 1 6>.rotor(1)
((5) (9) (2) (8) (1) (6))
> <5 9 2 8 1 6>.rotor(2)
((5 9) (2 8) (1 6))

In these examples, rotor groups the elements of the invocant into groups of 1 and 2 elements respectively.

The rotor method can take as parameter a key-value pair, whose value (the second item) specifies a gap between the various matches:

> (1..10).rotor(2 => 1)
((1 2) (4 5) (7 8))

As you can see, we obtain pairs of values, with a gap of 1 between the pairs (item 3, 6 and 9 are omitted from the list). Now, the gap can also be negative and, with a gap of -1, we get all successive pairs from the range:

> <5 9 2 8 1 6>.rotor(2 => -1)
((5 9) (9 2) (2 8) (8 1) (1 6))

This is exactly what we need: we can now subtract the first item from the second one in each sublist.

Continuing under the REPL, we can define the fwd-diff subroutine and use it as follows:

> sub fwd-diff (*@in) { map {$_[1] - $_[0]},  (@in).rotor(2 => -1)}
&fwd-diff
> say fwd-diff <5 9 2 8 1 6>
[4 -7 6 -7 5]
>

OK, enough experimenting with the REPL, we now know how to solve the challenge and can write our program:

use v6;

sub fwd-diff (*@in) { 
    map {$_[1] - $_[0]},  (@in).rotor(2 => -1)
}
sub MAIN (Int $order, *@values) {
    if @values.elems <= $order {
        die "Can't compute {$order}th series of {@values.elems} values";
    }
    my @result = @values;
    for 1 .. $order {
        @result = fwd-diff @result;
    }
    say "{$order}th forward diff of @values[] is: @result[]";
}

Testing with 6 values the forward difference series with orders 1 to 6 displays the following output:

$ fwd-diff.p6 1 5 9 2 8 1 6
1th forward diff of 5 9 2 8 1 6 is: 4 -7 6 -7 5

$ fwd-diff.p6 2 5 9 2 8 1 6
2th forward diff of 5 9 2 8 1 6 is: -11 13 -13 12

$ fwd-diff.p6 3 5 9 2 8 1 6
3th forward diff of 5 9 2 8 1 6 is: 24 -26 25

$ fwd-diff.p6 4 5 9 2 8 1 6
4th forward diff of 5 9 2 8 1 6 is: -50 51

$ fwd-diff.p6 5 5 9 2 8 1 6
5th forward diff of 5 9 2 8 1 6 is: 101

$ fwd-diff.p6 6 5 9 2 8 1 6
Can't compute 6th series of 6 values
  in sub MAIN at fwd-diff.p6 line 9
  in block <unit> at fwd-diff.p6 line 1

Note that I was hoping to get rid of the if @values.elems <= $order test and related die block by using a constraint in the signature of the MAIN subroutine, for example something like this:

sub MAIN (Int $order, *@values where @values.elems > $order) { # ...

but that does not appear to work properly.

Prime Factorization

Create a script that prints Prime Decomposition of a given number. The prime decomposition of a number is defined as a list of prime numbers which when all multiplied together, are equal to that number. For example, the Prime decomposition of 228 is 2,2,3,19 as 228 = 2 * 2 * 3 * 19.

Prime Factorization in Perl 5

The simplest way to solve this challenge is called trial division, i.e. to divide the input number by successive integers until the result is 1. This may appear to be a silly brute force approach, but it turns out to be fairly fast even for the largest integers that Perl 5 can natively handle (there is nothing in the challenge specification that says that we should be able to handle very large numbers). The only performance enhancements that we'll do here is to test even division by 2 and then only by successive odd numbers, and exit the loop when $div becomes too large. I thought for a moment that it would be worth to test only prime numbers, but first finding prime numbers would take more time than what we are likely to win.

We store the prime factors in a hash, with the key being a factor and the value the number of times this factor is a divisor of the input number.

The fact that factors are taken out of the number $num in ascending order garantees the list will only contain primes.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

my $num = shift;
my %factors;
while ($num % 2 == 0) {
    $factors{2} ++;
    $num /= 2;
}
my $div = 1;
while (1) {
    $div += 2;
    while ($num % $div == 0) {
        $factors{$div} ++;
        $num /= $div;
    }
    last if $num == 1;
    ++$factors{$div} and last if $div * 2  > $num;
}
for my $fact (sort { $a <=> $b } keys %factors) {
    say "$fact ** $factors{$fact}";
}

This is the output for some test values:

$ perl prime-fact.pl 12
2 ** 2
3 ** 1

$ perl prime-fact.pl 1200
2 ** 4
3 ** 1
5 ** 2

$ perl prime-fact.pl 1280
2 ** 8
5 ** 1

$ time perl prime-fact.pl 128089876
2 ** 2
463 ** 1
69163 ** 1

real    0m0,055s
user    0m0,015s
sys     0m0,030s


$ time perl prime-fact.pl 1280898769976
2 ** 3
7 ** 2
1783 ** 1
1832641 ** 1

real    0m0,118s
user    0m0,078s
sys     0m0,030s

As we can see on the last test, even for a number with 13 digits and one relatively large prime factor, the computation takes less than 0.2 second. With larger numbers having large prime factors, this might take a few seconds, but that's OK, I'm satisfied with that.

Prime Factorization in Perl 6

Perl 6 has a fast is-prime built-in routine that we can use to build a lazy infinite list of prime numbers, so that we can try even division by prime factors only.

use v6;

my @primes = grep {.is-prime}, 1..*;

sub MAIN (UInt $num is copy) {
    my %factors;
    for @primes -> $div {
        while ($num %% $div) {
            %factors{$div}++;
            $num div= $div;
        }
        last if $num == 1;
        ++%factors{$num} and last if $num.is-prime;
    }
    for sort {$^a <=> $^b}, keys %factors -> $fact {
        say "$fact ** %factors{$fact}";
    }
    say now - INIT now; # timings
}

Note that this line:

++%factors{$num} and last if $num.is-prime;

isn't really needed but brings a significant performance enhancement when the last factor to be found is very large, as it can be seen in the last three tests below (in such cases, Perl 6 runs significantly faster than Perl 5):

$ perl6 prime-fact.p6 12
2 ** 2
3 ** 1
0.0129253

$ perl6 prime-fact.p6 1200
2 ** 4
3 ** 1
5 ** 2
0.01692924

$ perl6 prime-fact.p6 1280
2 ** 8
5 ** 1
0.01294

$ perl6 prime-fact.p6 128089876
2 ** 2
463 ** 1
69163 ** 1
0.052831

$
$ perl6 prime-fact.p6 1280898769976
2 ** 3
7 ** 2
1783 ** 1
1832641 ** 1
0.1106868

$ perl6 prime-fact.p6 128089876997685
3 ** 1
5 ** 1
29 ** 1
37 ** 1
179 ** 1
44460137 ** 1
0.051871

perl6 prime-fact.p6 12808987699768576
2 ** 8
509 ** 1
98300801969 ** 1
0.0469033

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, September, 8. And, please, also spread the word about the Perl Weekly Challenge if you can.

Perl Weekly Challenge # 22: Sexy Prime Pairs and Compression Algorithm

These are some answers to the Week 22 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in several days from now (August 25, 2019). This blog post offers some solutions to this challenge, please don't read on if you intend to complete the challenge on your own.

Challenge # 1: Sexy Prime Pairs

Write a script to print first 10 Sexy Prime Pairs. Sexy primes are prime numbers that differ from each other by 6. For example, the numbers 5 and 11 are both sexy primes, because 11 - 5 = 6. The term “sexy prime” is a pun stemming from the Latin word for six: sex. For more information, please checkout wiki page.

My first question, when reading this definition, was whether sexy primes had to be consecutive prime numbers. The example provided (as well as those found in the the Wikipedia page) shows that it needs not be the case: 5 and 11 are not consecutive primes (since 7 is also prime). If sexy primes had to be consecutive primes, then the first such pair would be (23, 29). With that answer to my question, it seems to me that all we need to do is to look at each prime number p and check whether p + 6 is prime (and stop as soon as we have 10 sexy pairs.

Note that (1, 7) is not a sexy prime pair (despite having a gap of 6), because 1 is not considered to be a prime number. Therefore, to avoid the risk of finding a false sexy prime pair, we will start our search with number 2.

Sexy Prime Pairs in Perl 6

We first build a lazy infinite list @sexy-primes of prime numbers such that each such prime + 6 is also prime, and then print the pairs:

use v6;

my @sexy-primes = grep { .is-prime and ($_ + 6).is-prime}, (2, 3, *+2 ... Inf);
say "@sexy-primes[$_] ", @sexy-primes[$_] + 6 for ^10;

Note that, as a basis for finding the primes, we use a sequence operator with an explicit generator in order to check parity only for odd numbers. This avoids useless computations on even numbers which cannot be prime (except for 2). This might be considered premature optimization (and we all know what Donald Knuth said about premature optimization). Well, yes, but, at the same time, I don't like to let my programs do unnecessary work.

And this prints:

$ perl6 sexy-pairs.p6
5 11
7 13
11 17
13 19
17 23
23 29
31 37
37 43
41 47
47 52

This program is so short that we can easily get rid of the @sexy-primes temporary array and transform the script into a Perl6 one-liner:

$ perl6 'say "$_ ", $_+6 for (2...*).grep({.is-prime && ($_ + 6).is-prime})[^10];'
5 11
7 13
11 17
13 19
17 23
23 29
31 37
37 43
41 47
47 53

Sexy Prime Pairs in Perl 5

Since we know from our tests with Perl 6 that we're not going to look for prime numbers much larger than 50, we don't need to try hard to optimize the primality check subroutine as we've done in some previous weekly challenges. Our is_prime subroutine will simply test all possible factors between 2 and the square root of the number being checked.

Since Perl 5 doesn't have infinite lists, we will just use an infinite while loop instead and break out of it with a last statement once we've found what we need.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

sub is_prime{
    my $num = shift;
    for my $i (2 .. $num ** .5) {
        return 0 if $num % $i == 0;
    }
    return 1;
}

my ($candidate, $count) = (2, 0);
while (1) {
    if (is_prime $candidate and is_prime $candidate + 6) {
        say "$candidate ", $candidate + 6;
        $count ++
    }
    last if $count >= 10;
    $candidate ++;
}

And this prints out the same output as our P6 programs, no point of repeating it here.

Note that the program runs in 0.08 second, there was really no need to try to optimize performance.

Lempel–Ziv–Welch (LZW) Compression

Write a script to implement Lempel–Ziv–Welch (LZW) compression algorithm. The script should have method to encode/decode algorithm. The wiki page explains the compression algorithm very nicely.

Lempel–Ziv–Welch (LZW) is a lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of the LZ78 algorithm published by Lempel and Ziv in 1978.

The scenario described by Welch encodes sequences of 8-bit data as fixed-length 12-bit codes. The codes from 0 to 255 represent 1-character sequences consisting of the corresponding 8-bit character, and the codes 256 through 4095 are created in a dictionary for sequences encountered in the data as it is encoded. At each stage in compression, input bytes are gathered into a sequence until the next character would make a sequence with no code yet in the dictionary. The code for the sequence (without that character) is added to the output, and a new code (for the sequence with that character) is added to the dictionary.

For encoding (or, really, compressing) a string, we buffer input the characters in a sequence (note that we use here the variables names from the Wikipedia page to facilitate understanding) until the next is not in the %dict hash. Emit the code for , and add plus the next character to the hash. Start buffering again with the next character. Concretely, we first populate the %dict hash with the single possible letters. Then, we traverse the input string character by character and build the sequence as long as it exists in the dict hash. When the new sequence to be built does not exist in the hash, we add the previous sequence to the result, add the new one to the hash and start a new sequence with the last visited character.

For decoding (decompressing), we use the same initial hash as when encoding (we don't need the final hash, so we don't need to transmit the dictionary, which can be hard coded). Additional entries can be reconstructed as they are always simply concatenations of previous entries. Concretely, we populate %dict hash as before, but inverting keys and values. Then we go through the codes one by one; if a code exists in the hash, we just convert it and add it to the output; else, we build the new sequence, add it to the output and add the sequence concatenated with the sequence's first character to the hash.

There is nothing specific to either Perl 5 or Perl 6 in the above explanations, so they apply to both our P5 and P6 implementations below.

LZW Compression in Perl 6

For a start, we will use an input string ('TOBEORNOTTOBEORTOBEOR...') consisting only of capital letters ('A'..'Z'), as in the Wikipedia article, and populate our initial hash %dict with corresponding numeric codes between 0 and 25.

use v6;

constant $start-dict-size = 26;

sub encode (Str $in) {
    my %dict = map { $_[0] => $_[1] }, 
        ( ('A'..'Z') Z (^$start-dict-size) );
    my $ω = "";
    my @result = gather {
        for $in.comb -> $c {
            my $ωc = $ω ~ $c;
            if %dict{$ωc}:exists {
                $ω = $ωc;
            } else {
                take %dict{$ω};
                %dict{$ωc} = +%dict;
                $ω = $c;
            }
        }
        take %dict{$ω} if $ω.chars;
    }
    # say %dict;
    return @result;
}
sub decode (@encoded) {
    my $dict-size = $start-dict-size;
    my %dict = map { $_[1] => $_[0] }, 
        ( ('A'..'Z') Z (^$start-dict-size) );
    my $ω = %dict{shift @encoded};
    my @result = gather {
        take $ω; 
        for @encoded -> $i {
            my $str;
            if %dict{$i}:exists {
                $str = %dict{$i};
            } elsif  $i == $dict-size {
                $str = $ω ~ $ω.substr(0,1) 
            }
            take $str;
            %dict{$dict-size++} = $ω ~ $str.substr(0,1);
            $ω = $str;
        }
    }
    return join "", @result;
}

my $input_str = 'TOBEORNOTTOBETOBEORNOTTOBETOBEORNOTTOBE';
my @encoded = encode $input_str;
say @encoded;
say decode @encoded;

Running this code produces a correct round trip and displays the following output:

$ perl6 LZW_compression.p6
[19 14 1 4 14 17 13 14 19 26 28 35 29 31 33 37 37 30 32 34 27 4]
TOBEORNOTTOBETOBEORNOTTOBETOBEORNOTTOBE

The encoded (compressed) code has 22 numbers that could each be encoded over 6 bits, so that's a total of 132 bits. The input string had 39 bytes, i.e. 312 bits. In other words, we obtain a compression ratio of 2.36. Admittedly, we could have used a fixed-length encoding scheme and encoded each character of the input string over 5 bits, which would have led to a total of 195 bits, leading to a compression ratio of 1.6. We still get an LZW compression ratio which is 1.47 times better than a fixed-length encoding.

The reason for this better compression ratio is that many of our numeric codes represent two letters of the input, and some of them even more letters; for example, numeric code (35) stands for 3 letters, "TOB", and code 37 stands for 4 letters, "TOBE":

19 14 1 4 14 17 13 14 19 26 28 35  29 31 33 37   37   30 32 34 27 4
T  O  B E O  R  N  O  T  TO BE TOB EO RN OT TOBE TOBE OR NO TT OB E

Encoding only ASCII capital letters is of course very limited. Leaving aside Unicode, we would like at least to be able to compress bytes encoded over 256 bits. For this, we only need to change the $start-dict-size constant to 256 and to populate the initial %dict hash accordingly. For example, this way for the encode subroutine:

my %dict = map { .chr => $_ }, ^$start-dict-size;

And this way in the decode subroutine:

my %dict = map { $_ => .chr }, ^$start-dict-size;

The compressed code still has 22 numbers, but the compression rate would fall down, because these numbers would now need to be encoded over more bits:

[84 79 66 69 79 82 78 79 84 256 258 265 259 261 263 267 267 260 262 264 257 69]

And we can now compress data not comprising only of capital ASCII letters. For example, with the following input string:

To be or not to be, to be or not to be, that's the question

we obtain the following output:

perl6 LZW_compression.p6
[84 111 32 98 101 32 111 114 32 110 111 116 32 116 257 259 44 268 270 260 262 264 266 
273 258 101 272 116 104 97 116 39 115 268 104 260 113 117 101 115 116 105 111 110]
To be or not to be, to be or not to be, that's the question

LZW Compression in Perl 5

As for the P6 implementation, we will use an input string ('TOBEORNOTTOBE...') consisting only of capital letters ('A'..'Z'), as in the Wikipedia article, and populate our initial hash %dict with corresponding numeric codes between 0 and 25. Translating the Perl 6 implementation into Perl 5 is a bit tedious because of all these small pesky syntax differences between P5 and P6, but is conceptually a piece of cake. Here we go:

#!/usr/bin/perl
use strict;
use warnings;
use feature qw /say/;
use constant start_dict_size => 256;
use utf8;

sub encode {
    my $in = shift;
    my %dict = map { chr $_ => $_ } 0 .. start_dict_size - 1;
    my $ω = "";
    my @result;

    for my $c (split //, $in) {
        my $ωc = $ω . $c;
        if (exists $dict{$ωc}) {
            $ω = $ωc;
        } else {
            push @result, $dict{$ω};
            $dict{$ωc} = scalar keys %dict;
            $ω = $c;
        }
    }
    push @result, $dict{$ω} if length $ω;
    return @result;
}
sub decode {
    my @encoded = @_;
    my $dict_size = start_dict_size;
    my %dict = map { $_ => chr } 0 .. start_dict_size - 1;;
    my $ω = $dict{shift @encoded};
    my @result = ($ω); 
    for my $i (@encoded) {
        my $str;
        if (exists $dict{$i}) {
            $str = $dict{$i};
        } elsif  ($i == $dict_size) {
            $str = $ω . substr $ω, 0, 1; 
        } else { die "Error on $i" }
        push @result, $str;
        $dict{$dict_size++} = $ω . substr $str, 0, 1;
        $ω = $str;
    }
    return join "", @result;
}

my $input_str = 'TOBEORNOTTOBETOBEORNOTTOBETOBEORNOTTOBE';
my @encoded = encode $input_str;
say "@encoded";
say decode(@encoded);

The round trip works as before:

$ perl LZW_compression.pl
84 79 66 69 79 82 78 79 84 256 258 265 259 261 263 267 267 260 262 264 257 69
TOBEORNOTTOBETOBEORNOTTOBETOBEORNOTTOBE

(The numerical codes are not the same as in the original P6 implementation because we've used directly a starting %dict hash with 256 entries, but they are the same as in out second P6 test.)

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, September 1. And, please, also spread the word about the Perl Weekly Challenge if you can.

Perl Weekly Challenge # 21: Euler's Number and URL Normalizing

These are some answers to the Week 21 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in several days from now (August 18, 2019). This blog post offers some solutions to this challenge, please don't read on if you intend to complete the challenge on your own.

Challenge # 1: Euler's Number

Write a script to calculate the value of e, also known as Euler’s number and Napier’s constant. Please checkout this wiki page) for more information.

The number e is a mathematical constant that is the base of natural logarithms: It is the unique number whose natural logarithm is equal to 1 and its value is approximately 2.71828.

Euler's Number in Perl 6

Perl 6 has a built-in constant, e, for Euler's number, which we can just print:

$ perl6 -e 'say e'
2.718281828459045

So, job done? Well, maybe it is sort of cheating. Let's try to compute e really.

Let's try the formula used by Jacob Bernoulli in 1683: e is equal to the limit, when n tends to infinity, of (1 + 1/n)**n. We can just use this formula with a large input number:

use v6;

sub eul ($n) { (1 + 1/$n)**$n}

sub MAIN (Int $n) {
    say eul $n;
}

Let's try to run this program with increasing input numbers:

$perl6  euler.p6 5
2.48832

$perl6  euler.p6 10
2.5937424601

$perl6  euler.p6 100
2.7048138294215263

$perl6  euler.p6 1000
2.7169239322358925

$perl6  euler.p6 10000
2.718145926825225

It works, but the formula converges very slowly: with an input number of 10,000, we obtain only 4 correct digits. Let's try with a better formula. Euler's constant is equal to the sum, for n from 0 to infinity, of 1/n!, where n! is the factorial of n, i.e. the product of all positive integers between 1 and n.

For computing this, we will first define a new postfix operator, !, to compute the factorial of any number, and then use it to compute the sum. For this, we will use twice the [...] reduction metaoperator, which reduces a list of values with the given infix operator. For example,

say [+] 1, 2, 3, 4;   #  -> 10

is equivalent to:

say 1 + 2 + 3 + 4;

i.e. works as if the infix operator (+ in this example) was placed between each item of the list to produce an arithmetic expression yielding a single numerical value. This is the perfect functionality for computing both the factorial of an integer and the sum of terms of the formula.

use v6;

sub postfix:<!> (Int $n) {   # factorial operator
    [*] 2..$n;
}
sub eul (Int $n) {
    [+] map { 1 / $_! }, 0..$n;
}
sub MAIN (Int $n) {
    say eul $n;
}

The version with this new formula converges much faster than the original one:

$ perl6  euler.p6 10
2.7182818

$ perl6  euler.p6 100
2.718281828459045

Euler's Number in Perl 5

We don't have a builtin constant for e in Perl 5, but we can cheat almost as easily as in Perl 6:

$ perl -E 'print exp 1'
2.71828182845905

But, of course, we don't want to cheat: that wouldn't be a real challenge.

We've seen with our Perl 6 coding experiments that computing Euler's constant as the limit of the sum, for n from 0 to infinity, the terms 1/n! is quite efficient, as the result converges rather fast. We'll use the same formula in Perl 5.

Although Perl 5 does not allow the construction of new operators (such as the factorial ! operator in our P6 script) and does not have the [] reduction metaoperator, we can easily write subroutines for the same purposes.

Although I don't use them very commonly (because I am stuck on several servers at $work with old versions of Perl), I'll use here the subroutine signatures feature.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw /say signatures/;
no warnings 'experimental::signatures';

sub fact ($n) {
    my $fact = 1;
    $fact *= $_ for 2..$n;
    return $fact;
}
sub eul ($n) {
    my $euler;
    $euler += 1 / fact $_ for 0..$n;
    return $euler;
}

say eul shift;

This works as expected and produces the following output:

$ perl euler.pl 10
2.71828180114638

$ perl euler.pl 100
2.71828182845905

The first test with input 10 displays 8 correct digits, and all digits are correct in the second test with input 100.

URL Normalization

Write a script for URL normalization based on rfc3986. This task was shared by Anonymous Contributor.

According to Wikipedia, URL normalization is the process by which URLs are modified and standardized in a consistent manner. The goal of the normalization process is to transform a URL into a normalized URL so it is possible to determine if two syntactically different URLs may be equivalent.

URL normalization does not appear to be a well normalized process. Some of the changes may be useful for some purposes and unwanted in others. In the scripts suggested below, I have limited the changes to normalizations that preserve semantics plus removing dot-segments among the normalizations that usually preserve semantics. Other normalization rules are often unwanted or poorly defined.

To summarize, we will perform the following normalization actions:

  • Converting to lower case,
  • Capitalizing letters in escape sequences,
  • Decoding percent-encoded octets of unreserved characters,
  • Removing the default port,
  • Removing dot-segments.

URL Normalization in Perl 6

We will simply apply a series of successive regex substitutions to the URL, one (or in one case two) for each of the normalization actions.

In the normalize subroutine of the program below, we topicalize the URL (with the given keyword), so that we can use directly the regex substitution operator on the topical $_ variable. This simplifies the substitutions. We can write simply:

s:g/'/./'/\//;

instead of having to write, for each of the substitutions, something like:

$url ~~ s:g/'/./'/\//;

Each of the substitutions in the program below is commented to explain to which normalization action it refers to.

use v6;
use Test;

sub normalize (Str $url is copy) {
    constant $unreserved = (0x41..0x5A, 0x61..0x7A, 0x2D, 0x2E, 0x5F, 0x7E).Set;
    given $url {
        s:g/(\w+)/{lc $0}/;      # Lowercase letters
        s:g/('%'\w\w)/{uc $0}/;  # Capitalizing letters in escape sequences
        s:g/'%'(<xdigit>**2)     # Decoding percent-encoded octets
           <?{ (+"0x$0") (elem) $unreserved }> # code assertion
           /{:16(~$0).chr}/;
        s/':' 80 '/'/\//;        # Removing default port
        s:g/'/../'/\//;          # Removing two-dots segments
        s:g/'/./'/\//;           # Removing dot segments
    }
    return $url;
}

plan 5;
for < 1 HTTP://www.Example.com/              
        http://www.example.com/
      2 http://www.example.com/a%c2%b1b      
        http://www.example.com/a%C2%B1b
      3 http://www.example.com/%7Eusername/  
        http://www.example.com/~username/
      4 http://www.example.com:80/bar.html   
        http://www.example.com/bar.html
      5 http://www.example.com/../a/../c/./d.html 
        http://www.example.com/a/c/d.html
    > -> $num, $source, $target {
        cmp-ok normalize($source), 'eq', $target, "Test $num";
}

The five test cases work fine:

$ perl6  normalize_url.p6
1..5
ok 1 - Test 1
ok 2 - Test 2
ok 3 - Test 3
ok 4 - Test 4
ok 5 - Test 5

The decoding percent-encoded octets is a bit more complicated than the others and it might help to explain it a bit further. The first line:

    s:g/'%'(<xdigit>**2)     # Decoding percent-encoded octets

looks for a literal % character followed by two hexadecimal digits. But the match really occurs only if the code assertion immediately thereafter:

       <?{ (+"0x$0") (elem) $unreserved-range }> # code assertion

is successful, that is essentially if the two hexadecimal digits found belong to the $unreserved set of unreserved characters populated at the top of the subroutine. As a result, the substitution occurs only for the octets listed in that set.

Here, we have used five test cases, one for each of the normalization actions, because we don't have detailed specifications, but a real test plan would require more test cases based on actual specs.

URL Normalization in Perl 5

As for the P6 version, we will apply a series of successive regex substitutions to the URL, one (or in one case two) for each of the normalization actions.

In the normalize subroutine of the program below, we topicalize the URL (with the for keyword), so that we can use directly the regex substitution operator on the topical $_ variable. This simplifies the substitutions. We can write simply:

s{/\./}{/}g;

instead of having to write, for each of the substitutions, something like:

$url =~ s{/\./}{/}g;;

Each of the substitutions in the program below is commented to explain to which normalization action it refers to.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw /say/;
use Test::More tests => 5;

sub normalize {
    my $url = shift;
    my %unreserved = map {$_ => 1} 0x41..0x5A, 0x61..0x7A, 0x2D, 0x2E, 0x5F, 0x7E;
    for ($url) {
        s/(\w+)/lc $1/ge;    # Lowercase letters
        s/(%\w\w)/uc $1/ge;  # Capitalizing letters in escape sequences
        # Decoding percent-encoded octets
        if (/%([[:xdigit:]]{2})/ and exists $unreserved{hex $1} ) {
            s/%([[:xdigit:]]{2})/chr hex "0x$1"/xge;
        }
        s{:80/}{/};          # Removing default port
        s{/\.\./}{/}g;       # Removing two-dots segments
        s{/\./}{/}g;         # Removing dot segments
    }
    return $url;
}

for ( [ 1, 'HTTP://www.Example.com/',              'http://www.example.com/' ],
      [ 2, 'http://www.example.com/a%c2%b1b',      'http://www.example.com/a%C2%B1b' ], 
      [ 3, 'http://www.example.com/%7Eusername/',  'http://www.example.com/~username/' ],
      [ 4, 'http://www.example.com:80/bar.html',   'http://www.example.com/bar.html' ],
      [ 5, 'http://www.example.com/../a/../c/./d.html', 'http://www.example.com/a/c/d.html' ]
    ) { 
        my ($num, $source, $target) = @$_;
        is normalize($source),  $target, "Test $num";
}

The five test cases work fine:

 $ perl normalize_url.pl
 1..5
 ok 1 - Test 1
 ok 2 - Test 2
 ok 3 - Test 3
 ok 4 - Test 4
 ok 5 - Test 5

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, August 25. And, please, also spread the word about the Perl Weekly Challenge if you can.

Perl Weekly Challenge # 20: Split String on Character Change and Amicable Numbers

These are some answers to the Week 20 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in several days from now (August 11, 2019). This blog post offers some solutions to this challenge, please don't read on if you intend to complete the challenge on your own.

Challenge # 1: Split String on Character Change (P5 and P6)

Write a script to accept a string from command line and split it on change of character. For example, if the string is "ABBCDEEF", then it should split like "A", "BB", "C", "D", "EE", "F".

For this, it seemed fairly obvious to me that a simple regex in a one-liner should do the trick. Well, it turned out to be slightly more complicated that I anticipated in Perl 5. For example, running this very simple Perl 5 one-liner:

$ perl -E 'say join " ", "ABBCDEEF" =~ /((.)\2*)/g;'
A A BB B C C D D EE E F F

does more or less the splitting job correctly, but does not provide the desired output: we get one unwanted extra field for each wanted field. We can decide to filter out the unwanted fields:

$ perl -E 'my @a = "ABBCDEEF" =~ /((.)\2*)/g; say join " ", map $a[$_], grep {not $_ % 2} 0..$#a;'
A BB C D EE F

$ perl -E 'my @a = "ABBBCDEEF" =~ /((.)\2*)/g; say join " ", map $a[$_], grep {$_ % 2 == 0} 0..$#a;'
A BBB C D EE F

That seems to work fine.

But there is in fact a simpler way to do it. The reason for the repeated fields is that we have two pairs of capturing parentheses, and we need both of them for the regex to work properly. But we can easily print only one of the captures (i.e. only $1):

$ perl -E 'print "\"$1\" " while "ABBCDEEF" =~ /((.)\2*)/g;'
"A" "BB" "C" "D" "EE" "F"

For some reason, my original P5 try works fine with Perl 6 (with the necessary syntax changes) without having to filter out anything, as shown below:

$ perl6 -e 'say ~$/ if "ABBBCDEEF" ~~ m:g/((.)$0*)/;'
A BBB C D EE F

$ perl6 -e 'say ~$/ if "ABBCDEEF" ~~ m:g/((.)$0*)/;'
A BB C D EE F

Challenge # 2: Amicable Numbers

Write a script to print the smallest pair of Amicable Numbers. For more information, please checkout wikipedia page.

Amicable numbers are two different numbers so related that the sum of the proper divisors of each is equal to the other number. (A proper divisor of a number is a positive factor of that number other than the number itself. For example, the proper divisors of 6 are 1, 2, and 3.)

Amicable Numbers in Perl 5

We'll use the sum_divisors subroutine to find the divisors of a given number and return their sum. Then, we just loop over integers from 2 onward and call sum_divisors subroutine. If the sum of divisors is larger than the integer being examined (if it is smaller, then it is a number that we have already checked), then we check the sum of divisors of the sum of divisors. If it is equal to the integer, then we've found two amicable numbers.

#!/usr/bin/perl
use strict;
use warnings;
use feature "say";

sub sum_divisors {
    my $num = shift;
    my $limit = int $num / 2 ;
    my $sum = 1;
    for my $test_div (2..$limit) {
        $sum += $test_div if $num % $test_div == 0;
    }
    return $sum;
}

my $i = 2;
while (1) {
    my $sum_div = sum_divisors $i;
    if ($sum_div > $i and $i == sum_divisors $sum_div) {
        say "$i and $sum_div are amicable numbers";
        last;
    }
    $i++
}

Note that since we don't know in advance how large the first amicable numbers will be, we build an infinite loop and break out of it when we've found the first amicable numbers.

This program displays the following correct result:

$ perl amicable_nrs.pl
220 and 284 are amicable numbers

Amicable Numbers in Perl 6

We'll also use a sum_divisors subroutine doing something similar to the one in the P5 solution (but doing it in a somewhat simpler way). And loop over a lazy infinite list of integers with essentially the same algorithm as the P5 implementation.

use v6;

sub sum-divisors (Int $num) {
    my @divisors = grep { $num %% $_ }, 2..($num / 2).Int;
    return [+] 1, | @divisors;
}

for 2..Inf -> $i {
    my $sum_div = sum-divisors $i;
    if $sum_div > $i and $i == sum-divisors $sum_div {
        say "$i and $sum_div are amicable numbers";
        last;
    }
}

This program prints the same thing as the P5 program:

$ perl6 amicable_nrs.p6
220 and 284 are amicable numbers

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, August 18. And, please, also spread the word about the Perl Weekly Challenge if you can.

About laurent_r

user-pic I am the author of the "Think Perl 6" book (O'Reilly, 2017) and I blog about the Perl 5 and Raku programming languages.