Perl Weekly Challenge # 21: Euler's Number and URL Normalizing

These are some answers to the Week 21 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in several days from now (August 18, 2019). This blog post offers some solutions to this challenge, please don't read on if you intend to complete the challenge on your own.

Challenge # 1: Euler's Number

Write a script to calculate the value of e, also known as Euler’s number and Napier’s constant. Please checkout this wiki page) for more information.

The number e is a mathematical constant that is the base of natural logarithms: It is the unique number whose natural logarithm is equal to 1 and its value is approximately 2.71828.

Euler's Number in Perl 6

Perl 6 has a built-in constant, e, for Euler's number, which we can just print:

$ perl6 -e 'say e'
2.718281828459045

So, job done? Well, maybe it is sort of cheating. Let's try to compute e really.

Let's try the formula used by Jacob Bernoulli in 1683: e is equal to the limit, when n tends to infinity, of (1 + 1/n)**n. We can just use this formula with a large input number:

use v6;

sub eul ($n) { (1 + 1/$n)**$n}

sub MAIN (Int $n) {
    say eul $n;
}

Let's try to run this program with increasing input numbers:

$perl6  euler.p6 5
2.48832

$perl6  euler.p6 10
2.5937424601

$perl6  euler.p6 100
2.7048138294215263

$perl6  euler.p6 1000
2.7169239322358925

$perl6  euler.p6 10000
2.718145926825225

It works, but the formula converges very slowly: with an input number of 10,000, we obtain only 4 correct digits. Let's try with a better formula. Euler's constant is equal to the sum, for n from 0 to infinity, of 1/n!, where n! is the factorial of n, i.e. the product of all positive integers between 1 and n.

For computing this, we will first define a new postfix operator, !, to compute the factorial of any number, and then use it to compute the sum. For this, we will use twice the [...] reduction metaoperator, which reduces a list of values with the given infix operator. For example,

say [+] 1, 2, 3, 4;   #  -> 10

is equivalent to:

say 1 + 2 + 3 + 4;

i.e. works as if the infix operator (+ in this example) was placed between each item of the list to produce an arithmetic expression yielding a single numerical value. This is the perfect functionality for computing both the factorial of an integer and the sum of terms of the formula.

use v6;

sub postfix:<!> (Int $n) {   # factorial operator
    [*] 2..$n;
}
sub eul (Int $n) {
    [+] map { 1 / $_! }, 0..$n;
}
sub MAIN (Int $n) {
    say eul $n;
}

The version with this new formula converges much faster than the original one:

$ perl6  euler.p6 10
2.7182818

$ perl6  euler.p6 100
2.718281828459045

Euler's Number in Perl 5

We don't have a builtin constant for e in Perl 5, but we can cheat almost as easily as in Perl 6:

$ perl -E 'print exp 1'
2.71828182845905

But, of course, we don't want to cheat: that wouldn't be a real challenge.

We've seen with our Perl 6 coding experiments that computing Euler's constant as the limit of the sum, for n from 0 to infinity, the terms 1/n! is quite efficient, as the result converges rather fast. We'll use the same formula in Perl 5.

Although Perl 5 does not allow the construction of new operators (such as the factorial ! operator in our P6 script) and does not have the [] reduction metaoperator, we can easily write subroutines for the same purposes.

Although I don't use them very commonly (because I am stuck on several servers at $work with old versions of Perl), I'll use here the subroutine signatures feature.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw /say signatures/;
no warnings 'experimental::signatures';

sub fact ($n) {
    my $fact = 1;
    $fact *= $_ for 2..$n;
    return $fact;
}
sub eul ($n) {
    my $euler;
    $euler += 1 / fact $_ for 0..$n;
    return $euler;
}

say eul shift;

This works as expected and produces the following output:

$ perl euler.pl 10
2.71828180114638

$ perl euler.pl 100
2.71828182845905

The first test with input 10 displays 8 correct digits, and all digits are correct in the second test with input 100.

URL Normalization

Write a script for URL normalization based on rfc3986. This task was shared by Anonymous Contributor.

According to Wikipedia, URL normalization is the process by which URLs are modified and standardized in a consistent manner. The goal of the normalization process is to transform a URL into a normalized URL so it is possible to determine if two syntactically different URLs may be equivalent.

URL normalization does not appear to be a well normalized process. Some of the changes may be useful for some purposes and unwanted in others. In the scripts suggested below, I have limited the changes to normalizations that preserve semantics plus removing dot-segments among the normalizations that usually preserve semantics. Other normalization rules are often unwanted or poorly defined.

To summarize, we will perform the following normalization actions:

  • Converting to lower case,
  • Capitalizing letters in escape sequences,
  • Decoding percent-encoded octets of unreserved characters,
  • Removing the default port,
  • Removing dot-segments.

URL Normalization in Perl 6

We will simply apply a series of successive regex substitutions to the URL, one (or in one case two) for each of the normalization actions.

In the normalize subroutine of the program below, we topicalize the URL (with the given keyword), so that we can use directly the regex substitution operator on the topical $_ variable. This simplifies the substitutions. We can write simply:

s:g/'/./'/\//;

instead of having to write, for each of the substitutions, something like:

$url ~~ s:g/'/./'/\//;

Each of the substitutions in the program below is commented to explain to which normalization action it refers to.

use v6;
use Test;

sub normalize (Str $url is copy) {
    constant $unreserved = (0x41..0x5A, 0x61..0x7A, 0x2D, 0x2E, 0x5F, 0x7E).Set;
    given $url {
        s:g/(\w+)/{lc $0}/;      # Lowercase letters
        s:g/('%'\w\w)/{uc $0}/;  # Capitalizing letters in escape sequences
        s:g/'%'(<xdigit>**2)     # Decoding percent-encoded octets
           <?{ (+"0x$0") (elem) $unreserved }> # code assertion
           /{:16(~$0).chr}/;
        s/':' 80 '/'/\//;        # Removing default port
        s:g/'/../'/\//;          # Removing two-dots segments
        s:g/'/./'/\//;           # Removing dot segments
    }
    return $url;
}

plan 5;
for < 1 HTTP://www.Example.com/              
        http://www.example.com/
      2 http://www.example.com/a%c2%b1b      
        http://www.example.com/a%C2%B1b
      3 http://www.example.com/%7Eusername/  
        http://www.example.com/~username/
      4 http://www.example.com:80/bar.html   
        http://www.example.com/bar.html
      5 http://www.example.com/../a/../c/./d.html 
        http://www.example.com/a/c/d.html
    > -> $num, $source, $target {
        cmp-ok normalize($source), 'eq', $target, "Test $num";
}

The five test cases work fine:

$ perl6  normalize_url.p6
1..5
ok 1 - Test 1
ok 2 - Test 2
ok 3 - Test 3
ok 4 - Test 4
ok 5 - Test 5

The decoding percent-encoded octets is a bit more complicated than the others and it might help to explain it a bit further. The first line:

    s:g/'%'(<xdigit>**2)     # Decoding percent-encoded octets

looks for a literal % character followed by two hexadecimal digits. But the match really occurs only if the code assertion immediately thereafter:

       <?{ (+"0x$0") (elem) $unreserved-range }> # code assertion

is successful, that is essentially if the two hexadecimal digits found belong to the $unreserved set of unreserved characters populated at the top of the subroutine. As a result, the substitution occurs only for the octets listed in that set.

Here, we have used five test cases, one for each of the normalization actions, because we don't have detailed specifications, but a real test plan would require more test cases based on actual specs.

URL Normalization in Perl 5

As for the P6 version, we will apply a series of successive regex substitutions to the URL, one (or in one case two) for each of the normalization actions.

In the normalize subroutine of the program below, we topicalize the URL (with the for keyword), so that we can use directly the regex substitution operator on the topical $_ variable. This simplifies the substitutions. We can write simply:

s{/\./}{/}g;

instead of having to write, for each of the substitutions, something like:

$url =~ s{/\./}{/}g;;

Each of the substitutions in the program below is commented to explain to which normalization action it refers to.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw /say/;
use Test::More tests => 5;

sub normalize {
    my $url = shift;
    my %unreserved = map {$_ => 1} 0x41..0x5A, 0x61..0x7A, 0x2D, 0x2E, 0x5F, 0x7E;
    for ($url) {
        s/(\w+)/lc $1/ge;    # Lowercase letters
        s/(%\w\w)/uc $1/ge;  # Capitalizing letters in escape sequences
        # Decoding percent-encoded octets
        if (/%([[:xdigit:]]{2})/ and exists $unreserved{hex $1} ) {
            s/%([[:xdigit:]]{2})/chr hex "0x$1"/xge;
        }
        s{:80/}{/};          # Removing default port
        s{/\.\./}{/}g;       # Removing two-dots segments
        s{/\./}{/}g;         # Removing dot segments
    }
    return $url;
}

for ( [ 1, 'HTTP://www.Example.com/',              'http://www.example.com/' ],
      [ 2, 'http://www.example.com/a%c2%b1b',      'http://www.example.com/a%C2%B1b' ], 
      [ 3, 'http://www.example.com/%7Eusername/',  'http://www.example.com/~username/' ],
      [ 4, 'http://www.example.com:80/bar.html',   'http://www.example.com/bar.html' ],
      [ 5, 'http://www.example.com/../a/../c/./d.html', 'http://www.example.com/a/c/d.html' ]
    ) { 
        my ($num, $source, $target) = @$_;
        is normalize($source),  $target, "Test $num";
}

The five test cases work fine:

 $ perl normalize_url.pl
 1..5
 ok 1 - Test 1
 ok 2 - Test 2
 ok 3 - Test 3
 ok 4 - Test 4
 ok 5 - Test 5

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, August 25. And, please, also spread the word about the Perl Weekly Challenge if you can.

2 Comments

Interesting article. Makes me think about the accuracy and complexity of the algorithms. As for the challenge, you mean this one? A little confused by the dates.

Leave a comment

About laurent_r

user-pic I am the author of the "Think Perl 6" book (O'Reilly, 2017) and I blog about the Perl 5 and Raku programming languages.