Perl Weekly Challenge # 21: Euler's Number and URL Normalizing
These are some answers to the Week 21 of the Perl Weekly Challenge organized by Mohammad S. Anwar.
Spoiler Alert: This weekly challenge deadline is due in several days from now (August 18, 2019). This blog post offers some solutions to this challenge, please don't read on if you intend to complete the challenge on your own.
Challenge # 1: Euler's Number
Write a script to calculate the value of e, also known as Euler’s number and Napier’s constant. Please checkout this wiki page) for more information.
The number e is a mathematical constant that is the base of natural logarithms: It is the unique number whose natural logarithm is equal to 1 and its value is approximately 2.71828.
Euler's Number in Perl 6
Perl 6 has a built-in constant, e
, for Euler's number, which we can just print:
$ perl6 -e 'say e'
2.718281828459045
So, job done? Well, maybe it is sort of cheating. Let's try to compute e really.
Let's try the formula used by Jacob Bernoulli in 1683: e is equal to the limit, when n tends to infinity, of (1 + 1/n)**n
. We can just use this formula with a large input number:
use v6;
sub eul ($n) { (1 + 1/$n)**$n}
sub MAIN (Int $n) {
say eul $n;
}
Let's try to run this program with increasing input numbers:
$perl6 euler.p6 5
2.48832
$perl6 euler.p6 10
2.5937424601
$perl6 euler.p6 100
2.7048138294215263
$perl6 euler.p6 1000
2.7169239322358925
$perl6 euler.p6 10000
2.718145926825225
It works, but the formula converges very slowly: with an input number of 10,000, we obtain only 4 correct digits. Let's try with a better formula. Euler's constant is equal to the sum, for n from 0 to infinity, of 1/n!
, where n!
is the factorial of n
, i.e. the product of all positive integers between 1 and n
.
For computing this, we will first define a new postfix operator, !
, to compute the factorial of any number, and then use it to compute the sum. For this, we will use twice the [...]
reduction metaoperator, which reduces a list of values with the given infix operator. For example,
say [+] 1, 2, 3, 4; # -> 10
is equivalent to:
say 1 + 2 + 3 + 4;
i.e. works as if the infix operator (+
in this example) was placed between each item of the list to produce an arithmetic expression yielding a single numerical value. This is the perfect functionality for computing both the factorial of an integer and the sum of terms of the formula.
use v6;
sub postfix:<!> (Int $n) { # factorial operator
[*] 2..$n;
}
sub eul (Int $n) {
[+] map { 1 / $_! }, 0..$n;
}
sub MAIN (Int $n) {
say eul $n;
}
The version with this new formula converges much faster than the original one:
$ perl6 euler.p6 10
2.7182818
$ perl6 euler.p6 100
2.718281828459045
Euler's Number in Perl 5
We don't have a builtin constant for e in Perl 5, but we can cheat almost as easily as in Perl 6:
$ perl -E 'print exp 1'
2.71828182845905
But, of course, we don't want to cheat: that wouldn't be a real challenge.
We've seen with our Perl 6 coding experiments that computing Euler's constant as the limit of the sum, for n from 0 to infinity, the terms 1/n!
is quite efficient, as the result converges rather fast. We'll use the same formula in Perl 5.
Although Perl 5 does not allow the construction of new operators (such as the factorial !
operator in our P6 script) and does not have the []
reduction metaoperator, we can easily write subroutines for the same purposes.
Although I don't use them very commonly (because I am stuck on several servers at $work
with old versions of Perl), I'll use here the subroutine signatures feature.
#!/usr/bin/perl
use strict;
use warnings;
use feature qw /say signatures/;
no warnings 'experimental::signatures';
sub fact ($n) {
my $fact = 1;
$fact *= $_ for 2..$n;
return $fact;
}
sub eul ($n) {
my $euler;
$euler += 1 / fact $_ for 0..$n;
return $euler;
}
say eul shift;
This works as expected and produces the following output:
$ perl euler.pl 10
2.71828180114638
$ perl euler.pl 100
2.71828182845905
The first test with input 10 displays 8 correct digits, and all digits are correct in the second test with input 100.
URL Normalization
Write a script for URL normalization based on rfc3986. This task was shared by Anonymous Contributor.
According to Wikipedia, URL normalization is the process by which URLs are modified and standardized in a consistent manner. The goal of the normalization process is to transform a URL into a normalized URL so it is possible to determine if two syntactically different URLs may be equivalent.
URL normalization does not appear to be a well normalized process. Some of the changes may be useful for some purposes and unwanted in others. In the scripts suggested below, I have limited the changes to normalizations that preserve semantics plus removing dot-segments among the normalizations that usually preserve semantics. Other normalization rules are often unwanted or poorly defined.
To summarize, we will perform the following normalization actions:
- Converting to lower case,
- Capitalizing letters in escape sequences,
- Decoding percent-encoded octets of unreserved characters,
- Removing the default port,
- Removing dot-segments.
URL Normalization in Perl 6
We will simply apply a series of successive regex substitutions to the URL, one (or in one case two) for each of the normalization actions.
In the normalize
subroutine of the program below, we topicalize the URL (with the given
keyword), so that we can use directly the regex substitution operator on the topical $_
variable. This simplifies the substitutions. We can write simply:
s:g/'/./'/\//;
instead of having to write, for each of the substitutions, something like:
$url ~~ s:g/'/./'/\//;
Each of the substitutions in the program below is commented to explain to which normalization action it refers to.
use v6;
use Test;
sub normalize (Str $url is copy) {
constant $unreserved = (0x41..0x5A, 0x61..0x7A, 0x2D, 0x2E, 0x5F, 0x7E).Set;
given $url {
s:g/(\w+)/{lc $0}/; # Lowercase letters
s:g/('%'\w\w)/{uc $0}/; # Capitalizing letters in escape sequences
s:g/'%'(<xdigit>**2) # Decoding percent-encoded octets
<?{ (+"0x$0") (elem) $unreserved }> # code assertion
/{:16(~$0).chr}/;
s/':' 80 '/'/\//; # Removing default port
s:g/'/../'/\//; # Removing two-dots segments
s:g/'/./'/\//; # Removing dot segments
}
return $url;
}
plan 5;
for < 1 HTTP://www.Example.com/
http://www.example.com/
2 http://www.example.com/a%c2%b1b
http://www.example.com/a%C2%B1b
3 http://www.example.com/%7Eusername/
http://www.example.com/~username/
4 http://www.example.com:80/bar.html
http://www.example.com/bar.html
5 http://www.example.com/../a/../c/./d.html
http://www.example.com/a/c/d.html
> -> $num, $source, $target {
cmp-ok normalize($source), 'eq', $target, "Test $num";
}
The five test cases work fine:
$ perl6 normalize_url.p6
1..5
ok 1 - Test 1
ok 2 - Test 2
ok 3 - Test 3
ok 4 - Test 4
ok 5 - Test 5
The decoding percent-encoded octets is a bit more complicated than the others and it might help to explain it a bit further. The first line:
s:g/'%'(<xdigit>**2) # Decoding percent-encoded octets
looks for a literal %
character followed by two hexadecimal digits. But the match really occurs only if the code assertion immediately thereafter:
<?{ (+"0x$0") (elem) $unreserved-range }> # code assertion
is successful, that is essentially if the two hexadecimal digits found belong to the $unreserved
set of unreserved characters populated at the top of the subroutine. As a result, the substitution occurs only for the octets listed in that set.
Here, we have used five test cases, one for each of the normalization actions, because we don't have detailed specifications, but a real test plan would require more test cases based on actual specs.
URL Normalization in Perl 5
As for the P6 version, we will apply a series of successive regex substitutions to the URL, one (or in one case two) for each of the normalization actions.
In the normalize
subroutine of the program below, we topicalize the URL (with the for
keyword), so that we can use directly the regex substitution operator on the topical $_
variable. This simplifies the substitutions. We can write simply:
s{/\./}{/}g;
instead of having to write, for each of the substitutions, something like:
$url =~ s{/\./}{/}g;;
Each of the substitutions in the program below is commented to explain to which normalization action it refers to.
#!/usr/bin/perl
use strict;
use warnings;
use feature qw /say/;
use Test::More tests => 5;
sub normalize {
my $url = shift;
my %unreserved = map {$_ => 1} 0x41..0x5A, 0x61..0x7A, 0x2D, 0x2E, 0x5F, 0x7E;
for ($url) {
s/(\w+)/lc $1/ge; # Lowercase letters
s/(%\w\w)/uc $1/ge; # Capitalizing letters in escape sequences
# Decoding percent-encoded octets
if (/%([[:xdigit:]]{2})/ and exists $unreserved{hex $1} ) {
s/%([[:xdigit:]]{2})/chr hex "0x$1"/xge;
}
s{:80/}{/}; # Removing default port
s{/\.\./}{/}g; # Removing two-dots segments
s{/\./}{/}g; # Removing dot segments
}
return $url;
}
for ( [ 1, 'HTTP://www.Example.com/', 'http://www.example.com/' ],
[ 2, 'http://www.example.com/a%c2%b1b', 'http://www.example.com/a%C2%B1b' ],
[ 3, 'http://www.example.com/%7Eusername/', 'http://www.example.com/~username/' ],
[ 4, 'http://www.example.com:80/bar.html', 'http://www.example.com/bar.html' ],
[ 5, 'http://www.example.com/../a/../c/./d.html', 'http://www.example.com/a/c/d.html' ]
) {
my ($num, $source, $target) = @$_;
is normalize($source), $target, "Test $num";
}
The five test cases work fine:
$ perl normalize_url.pl
1..5
ok 1 - Test 1
ok 2 - Test 2
ok 3 - Test 3
ok 4 - Test 4
ok 5 - Test 5
Wrapping up
The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, August 25. And, please, also spread the word about the Perl Weekly Challenge if you can.
Interesting article. Makes me think about the accuracy and complexity of the algorithms. As for the challenge, you mean this one? A little confused by the dates.
Thank you for your comment.
Yes, this is the challenge I mean (Challenge 21, with answers due for tomorrow i.e. Sunday, Aug. 18, 2019).