September 2019 Archives

Perl Weekly Challenge 27: Intersection Point and Historical Values

These are some answers to the Week 27 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Challenge # 1: Intersection of Two Lines

Write a script to find the intersection of two straight lines. The co-ordinates of the two lines should be provided as command line parameter. For example:

The two ends of Line 1 are represented as co-ordinates (a,b) and (c,d).

The two ends of Line 2 are represented as co-ordinates (p,q) and (r,s).

The script should print the co-ordinates of point of intersection of the above two lines.

This is really elementary math, but, as I haven't used any linear algebra for many years, I needed about 10 to 15 minutes with a pencil and a piece of paper to work out how to find the equation of a straight line going through two points and how to compute the intersection of two lines. For the benefits of those of you in the same situation, let me briefly summarize how this works. You may jump to the next section if you don't need any refresher on these subjects.

The equation of a straight line is usually written as y = ax + b (or, in some countries, y = mx + b or y = mx + c, but it's just the name of the coefficients changing), where x and y are the coordinates of any point belonging to the line, a is the slope (or gradient, i.e. how steep the line is) of the line, and b the y-intercept (the value of y when x is zero, or the place where the line crosses the Y axis). The slope is the change in y divided by the change in x. For finding the slope of a line going through two points with coordinates x1, y1 and x2, y2, the slope a is the ordinate (y) difference of the points divided by their abscissa (x) difference:

a = (y2 - y1) / (x2 - x1)

Of course, we have a division by zero problem if x2 equals x1 (i.e. the line is vertical, at least in an orthonormal base or Cartesian plane), but we'll come back to that special edge case later.

For finding the y-intercept (b), you just need to reformulate y = ax + b as b = y - ax, and to replace a by the slope found with the above formula, and y and x by the coordinates of any of the two points.

For finding the intersection point of two lines y = a1 * x + b1 and y = a2 * x + b2, you need to figure out that it is the point of the lines for which the ordinate (y) is the same for an equal value of the abscissa (x), i.e. write the following equations:

a1 * x + b1 = a2 * x + b2
          <=>
(a1 - a2) *x = b2 - b1
          <=>
x = (b2 - b1) / (a1 - a2) # (if a1 != a2)

Once the abscissa x of the intersection has been found, it is easy to find its ordinate y using the equation of any of the two lines.

If the lines' slopes are equal, then the equation above has a division by zero problem, which reflects the fact that the line segments defined by the point pairs are parallel or colinear, meaning that the problem has no solution (there is no intersection point).

Intersection Point in Perl 5

With the above knowledge secured, it is fairly easy to write a Perl 5 program computing the intersection point of two lines defined by two point pairs.

We use the find_line subroutine twice (once for every point pair) to find the slope and y-intercept of each line and the find-intersect subroutine to find the coordinates of the point where the two lines intersect.

There is one slight complication, though: if one (and only one) of the point pairs have points with the same abscissa, we cannot write a linear equation for that pair of points, but the straight line is nonetheless well defined (provided the ordinates are different): it is a vertical line where all point have the same abscissa (x value). We cannot write an equation for such a line, but may still find the intersection point with the other line: it is the point of that other line having this abscissa. This pesky edge case accounts for a good chunk (20 code lines) of the code below.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

sub find_line {
    my ($x1, $y1, $x2, $y2) = @_;
    my $slope = ($y2 - $y1) / ($x2 - $x1);
    # find b for y1 = slope * x1 + b
    my $b = $y1 - $slope * $x1;
    return $slope, $b;
}

sub find_intersect {
    my ($a1, $b1, $a2, $b2) = @_;
    # solve y = ax + b for a1, b1 and a2, b2
    # i.e.: a1 x + b1 = a2 x + b2 <=> x (a1 - a2) = b2 - b1
    die "The segments are parallel or colinear, no intersection point!" if ($a1 == $a2);
    my $abscissa = ($b2 - $b1) / ($a1 - $a2);
    say "x = $abscissa";
    my $ordinate = $a1 * $abscissa + $b1;
    return $abscissa, $ordinate;
}
my ($a1, $b1, $a2, $b2);
if (@ARGV == 8) {
    die "The two segments are vertical, no intersection point"
        if $ARGV[0] == $ARGV[2] and $ARGV[4] == $ARGV[6];
    if ($ARGV[0] == $ARGV[2]) {
        #First segment is vertical
        my $abscissa = $ARGV[0];
        ($a2, $b2) = find_line @ARGV[4..7];
        my $ordinate = $a2 * $abscissa + $b2;
        say "Intersection point: $abscissa, $ordinate";
        exit 0;
    }
    if ($ARGV[4] == $ARGV[6]) {
        # Second segment is vertical
        my $abscissa = $ARGV[4];
        ($a1, $b1) = find_line @ARGV[0..3];
        my $ordinate = $a1 * $abscissa + $b1;
        say "Intersection point: $abscissa, $ordinate";
        exit 0;
    }        
    ($a1, $b1) = find_line @ARGV[0..3];
    ($a2, $b2) = find_line @ARGV[4..7];
} else {
    # default test values if arguments are missing or insufficient
    ($a1, $b1) = find_line 3, 1, 5, 3;
    ($a2, $b2) = find_line 3, 3, 6, 0;
}
say "a1: $a1";
say "b1: $b1";
say "a2: $a2";
say "b2: $b2";

my ($x, $y) = find_intersect ($a1, $b1, $a2, $b2);
say "Intersection point abscissa: $x";
say "Intersection point ordinate: $y";

Note that, in a real-life program, we should really check that the points of each pair are distinct (to properly define a straight line), but I did not want to clutter the code with even more edge cases: I'll assume that the user knows what she or he is doing and will pass arguments making sense. We'll do more of these argument checks in the Perl 6 version.

With no argument, the program uses the default values and duly prints the following:

$ perl intersection.pl
a1: 1
b1: -2
a2: -1
b2: 6
x = 4
Intersection point abscissa: 4
Intersection point ordinate: 2

These are a few example runs on edge cases:

$ perl intersection.pl 3 4 3 6 7 8 9 10
Intersection point: 3, 4

$ perl intersection.pl 3 4 3 6 8 8 9 10
Intersection point: 3, -2

$ perl intersection.pl 3 4 3 6 3 3 6 0
Intersection point: 3, 3

$ perl intersection.pl 4 4 4 6 3 3 6 0
Intersection point: 4, 2

$ perl intersection.pl 3 1 5 3 3 3 3 0
Intersection point: 3, 1

Intersection Point in Perl 6

We could simply translate the P5 program in Perl 6, but this type of problem calls for object-oriented programming. So, we will define a Point type and a Segment class (with two Point attributes) providing the slope and y-intercept methods to compute the equation of a line passing through the two points. The Point role also provides a gist method enabling pretty printing of the point coordinates when using the say built-in function on a Point instance.

use v6;

role Point {
    has $.x;
    has $.y;

    method gist {
        return "\n- Abscissa: $.x\n- Ordinate: $.y.";
    }
}
class Segment {
    has Point $.start;
    has Point $.end;

    method slope {
        return ($.end.y - $.start.y) / ($.end.x - $.start.x);
    }
    method y-intercept {
        my $slope = self.slope;
        return $.start.y - $slope * $.start.x;
    }
    method line-coordinates { # used only for debugging purpose
        return self.slope, self.y-intercept;
    }
}

sub compute-intersection (Segment $s1, Segment $s2) {
    my $abscissa = ($s2.y-intercept - $s1.y-intercept) /
                   ($s1.slope - $s2.slope);
    my $ordinate = $s1.slope * $abscissa + $s1.y-intercept;
    my $intersection = Point.new( x => $abscissa, y => $ordinate);
}

multi MAIN ( $a1, $b1, # start of line segment 1
             $a2, $b2, # end of line segment 1
             $a3, $b3, # start of line segment 2
             $a4, $b4  # end of line segment 2
         ) {
    my $segment1 = Segment.new(
                         start => Point.new(x => $a1, y => $b1),
                         end   => Point.new(x => $a2, y => $b2)
                              );
    my $segment2 = Segment.new(
                         start => Point.new(x => $a3, y => $b3),
                         end   => Point.new(x => $a4, y => $b4)
                              );
    say "Coordinates of intersection point: ", 
        compute-intersection $segment1, $segment2;
}
multi MAIN () { 
    say "Using default input values for testing. ";
    say "Should display point (2, 4).";
    my $segment1 = Segment.new(
                         start => Point.new(x => 3, y => 1),
                         end   => Point.new(x => 5, y => 3)
                              );
    my $segment2 = Segment.new(
                         start => Point.new(x => 3, y => 3),
                         end   => Point.new(x => 6, y => 0)
                              );
    # say "Segment 1: ", $segment1.line-coordinates;
    # say "Segment 2: ", $segment2.line-coordinates;
    say  "Coordinates of intersection point: ", 
        compute-intersection $segment1, $segment2;
}

This is a sample run of the program:

$ perl6  intersection.p6  3 1 5 3 3 3 6 0
Coordinates of intersection point:
- Abscissa: 4
- Ordinate: 2.

As it is, this program isn't doing any validation on its arguments. So we will add a valid-args subroutine for that purpose and also check that the computed segments are not parallel.

use v6;

role Point {
    has $.x;
    has $.y;

    method gist {
        return "\n- Abscissa: $.x\n- Ordinate: $.y.";
    }
}
class Segment {
    has Point $.start;
    has Point $.end;

    method slope {
        return ($.end.y - $.start.y) / ($.end.x - $.start.x);
    }
    method y-intercept {
        my $slope = self.slope;
        return $.start.y - $slope * $.start.x;
    }
    method line-coordinates {
        return self.slope, self.y-intercept;
    }
}
sub compute-intersection (Segment $s1, Segment $s2) {
    my $abscissa = ($s2.y-intercept - $s1.y-intercept) /
                   ($s1.slope - $s2.slope);
    my $ordinate = $s1.slope * $abscissa + $s1.y-intercept;
    my $intersection = Point.new( x => $abscissa, y => $ordinate);
}
multi MAIN ( $a1, $b1, # start of line segment 1
             $a2, $b2, # end of line segment 1
             $a3, $b3, # start of line segment 2
             $a4, $b4  # end of line segment 2
         ) {
    exit unless valid-args |@*ARGS;
    my $segment1 = Segment.new(
            start => Point.new(x => $a1, y => $b1),
            end   => Point.new(x => $a2, y => $b2)
                              );
    my $segment2 = Segment.new(
            start => Point.new(x => $a3, y => $b3),
            end   => Point.new(x => $a4, y => $b4)
                              );
    say "Segments are parallel or colinear." and exit 
        if $segment1.slope == $segment2.slope;
    say "Coordinates of intersection point: ", 
        compute-intersection $segment1, $segment2;
}
multi MAIN () { 
    say "Using default input values for testing. Should display poinr (2, 4).";
    my $segment1 = Segment.new(
            start => Point.new(x => 3, y => 1),
            end   => Point.new(x => 5, y => 3)
                              );
    my $segment2 = Segment.new(
            start => Point.new(x => 3, y => 3),
            end   => Point.new(x => 6, y => 0)
                              );
    say  "Coordinates of intersection point: ", 
        compute-intersection $segment1, $segment2;
}
sub valid-args ( $a1, $b1, # start of line segment 1
                 $a2, $b2, # end of line segment 1
                 $a3, $b3, # start of line segment 2
                 $a4, $b4  # end of line segment 2
         ) {
    unless @*ARGS.all ~~ /<[\d]>+/ {
        say "Non numeric argument. Can't continue.";
        return False;
    }
    if $a1 == $a2 and $b1 == $b2 {
        say "The first two points are the same. Cannot draw a line.";
        return False;
    }
    if $a3 == $a4 and $b3 == $b4 {
        say "The last two points are the same. Cannot draw a line.";
        return False;
    }
    if $a1 == $a2 and $a3 == $a4 {
        say "The two segments are vertical. No intersection.";
        return False;
    }
    if $a1 == $a2 {
        # First segment is vertical but not the second one
        my $segment2 = Segment.new(
                start => Point.new(x => $a3, y => $b3),
                end   => Point.new(x => $a4, y => $b4)
            );
        my $ordinate = $segment2.slope 
            * $a1 + $segment2.y-intercept;
        my $interception = Point.new(x => $a1, y => $ordinate);
        say "Coordinates of intersection point: ", $interception;
        return False;
    }
    if $a3 == $a4 {
        # Second segment is vertical but not the first one
        my $segment1 = Segment.new(
                start => Point.new(x => $a1, y => $b1),
                end   => Point.new(x => $a2, y => $b2)
            );
        my $ordinate = $segment1.slope 
            * $a3 + $segment1.y-intercept;
        my $interception = Point.new(x => $a3, y => $ordinate);
        say "Coordinates of intersection point: ", $interception;
        return False;
    }
    return True;
}

Running the program with some examples of valid or invalid arguments displays the following:

$ perl6  intersection.p6  3 1 5 3 3 3 n 0
Non numeric argument. Can't continue.

$ perl6  intersection.p6  3 1 5 3 3 3 5.4 0
Coordinates of intersection point:
- Abscissa: 3.888889
- Ordinate: 1.888889.

$ perl6  intersection.p6  3 1 5 3 3 3 6.0 0
Coordinates of intersection point:
- Abscissa: 4
- Ordinate: 2.

$ perl6  intersection.p6  3 1 5 3 6 2 10 6
Segments are parallel or colinear.

$ perl6  intersection.p6  3 1 3 1 3 3 6 0
The first two points are the same. Cannot draw a line.

$ perl6  intersection.p6  3 1 3 2 3 5 3 0
The two segments are vertical. Cannot find an intersection.

Challenge # 2: Displaying Historical Values

Write a script that allows you to capture/display historical data. It could be an object or a scalar. For example:

my $x = 10; $x = 20; $x -= 5;

This week was very busy for me and I'm running a bit out of time. My answers to this task will be somewhat minimalist.

Displaying Historical Values in Perl 5

I do not know whether it is possible to override the = operator in Perl 5 and have no time to find out. Another possibility might be to use the tie function or something similar to associate a value with an object, but I have used it only very rarely and have very limited experience with it, and I also don't have enough time to experiment with it. Therefore, I'll cheat a little bit and use an $assign code reference for assigning new values to a watched variable. The create_watched_value subroutine acts as a watched variable constructor and is a function factory that returns three closures, $assign, $get_past_values, and $get_current_values. The main code then uses these code references to perform various assignments and output.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

sub create_watched_value {
    my $value = shift;
    my @past_values;
    my $assign = sub {
        my $new_val = shift;
        push @past_values, $value;
        $value = $new_val;
    };
    my $get_past_values = sub {
        return "@past_values";
    };
    my $get_current_value = sub {
        return $value;
    };
    return $assign, $get_past_values, $get_current_value;
}

my ($assign, $get_past, $get_current) = create_watched_value 10;
say "Current: ", $get_current->();
$assign->(15);
say "Current: ", $get_current->();
$assign->(20);
say "Current: ", $get_current->();
$assign->(5);
say "Current: ", $get_current->();
say "Past: ", $get_past->();

This program runs fine and displays the following output:

$ perl watched-values.pl
Current: 10
Current: 15
Current: 20
Current: 5
Past: 10 15 20

Displaying Historical Values in Perl 6

I initially tried to redefine the = assignment operator but that appears to be impossible:

Cannot override infix operator '=', as it is a special form handled directly by the compiler

So, I decided to create my own =:= assignment operator for watched variables. Besides that, the program uses the WatchedValue class to enable the storing of current and past values.

use v6;

class WatchedValue {
    has Int $.current-value is rw;
    has @.past-values = ();

    method get-past-values {
        return @.past-values;
    }
}

multi sub infix:<=:=> (WatchedValue $y, Int $z) {
    push $y.past-values, $y.current-value;
    $y.current-value = $z;
}
my $x = WatchedValue.new(current-value => 10);
say "Current: ", $x.current-value;
$x =:= 15;
say "Current: ", $x.current-value;
$x =:= 5;
say "Current: ", $x.current-value;
$x =:= 20;
say "Current: ", $x.current-value;
say "Past values: ", $x.get-past-values;

When running the program; I get warnings for each assignment:

Useless use of "=:=" in expression "$x =:= 15" in sink context (line 18)

I do not know how to suppress these warnings (it seems that the no warnings ... pragma isn't implemented yet), but the program otherwise runs correctly and displays the successive values:

Current: 10
Current: 15
Current: 5
Current: 20
Past values: [10 15 5]

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, October 6. And, please, also spread the word about the Perl Weekly Challenge if you can.

Perl Weekly Challenge 26: Common Letters and Mean Angles

These are some answers to the Week 26 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in several days from now (September 22, 2019). This blog post offers some solutions to this challenge. Please don't read on if you intend to complete the challenge on your own, which you're strongly encouraged to do.

Challenge # 1: Common Letters Count

Create a script that accepts two strings, let us call it, “stones” and “jewels”. It should print the count of “alphabet” from the string “stones” found in the string “jewels”. For example, if your stones is “chancellor” and “jewels” is “chocolate”, then the script should print “8”. To keep it simple, only A-Z,a-z characters are acceptable. Also make the comparison case sensitive.

We're given two strings and need to find out how many characters of the second string can be found in the first string.

Common Letters Count in Perl 5

This is straight forward. Our script should be given two arguments (else we abort the program). We split the first string into individual letters and store them in the %letters hash. Note that we filter out any character not in the [A-Za-z] character class. Then we split the second string into individual letters, keep only letters found in the %letters hash and finally coerce the resulting list of letters in a scalar context to transform it in a letter count (note that the scalar keyword isn't really needed here, as we have a scalar context anyway, but I included it to make it easier to understand).

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

@ARGV == 2 or die "This script needs two strings are parameters";
my ($str1, $str2) = @ARGV;
my %letters = map {$_ => 1} grep /[A-Za-z]/, split "", $str1;
my $count = scalar grep { exists $letters{$_}} split "", $str2;
say "$str2 has $count letters from $str1";

Running the program:

$ perl count_letters.pl chocolate chancellor
chancellor has 8 letters from chocolate

$ perl count_letters.pl chancellor chocolate
chocolate has 8 letters from chancellor

$ perl count_letters.pl chancellor CHOCOLATE
CHOCOLATE has 0 letters from chancellor

We get the expected result. The last test shows that the comparison is case-sensitive, as requested in the specification.

Common Letters Count in Perl 6

We will use more or less the same idea as in P5, except that we'll use a set instead of a hash for storing unique letters of the first string.

use v6;

sub MAIN (Str $str1, Str $str2) {
    my $letters = $str1.comb.grep( /<[A..Za..z]>/ ).Set;
    my $count = $str2.comb.grep( { $_ (elem) $letters} ).elems;
    say "$str2 has $count letters from $str1";
}

This works as expected:

$ perl6 count_letters.p6 chocolate chancellor
chancellor has 8 letters from chocolate

$ perl6 count_letters.p6 chocolate CHANCELLOR
CHANCELLOR has 0 letters from chocolate

Mean Angles

Create a script that prints mean angles of the given list of angles in degrees. Please read wiki page that explains the formula in details with an example.

In mathematics, a mean of circular quantities is a mean which is sometimes better-suited for quantities like angles, day times, and fractional parts of real numbers. This is necessary since most of the usual means may not be appropriate on circular quantities. For example, the arithmetic mean of 0° and 360° is 180°, which is misleading because for most purposes 360° is the same thing as 0°.

A common formula for the mean of a list of angles is:

angle-mean.jpg

We just need to apply the formula, after having converted the input values from degrees to radians.

The Wikipedia page has the following example, that we will use in our tests: consider the following three angles as an example: 10, 20, and 30 degrees. Intuitively, calculating the mean would involve adding these three angles together and dividing by 3, in this case indeed resulting in a correct mean angle of 20 degrees. By rotating this system anticlockwise through 15 degrees the three angles become 355 degrees, 5 degrees and 15 degrees. The naive mean is now 125 degrees, which is the wrong answer, as it should be 5 degrees.

Mean Angles in Perl 5

There are a number of modules that could be used here to convert degrees to radians and radians to degrees, to compute arithmetic means and perhaps even to compute directly mean angles. But that wouldn't be a challenge if we were just using modules to dodge the real work.

So I wrote the deg2rad and rad2deg subroutines to do the angle unit conversions, and computed the arithmetic means of sines and cosines in a for loop.

As I do not have a use for such a program, I will implement the necessary subroutine and just use them in a series of tests.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;
use constant PI => atan2(1, 0) * 2;
use Test::More;
plan tests => 9;


sub deg2rad { return $_[0] * PI /180; }
sub rad2deg { return $_[0] * 180 / PI }

sub mean {
    my @angles = map { deg2rad $_ } @_;
    my $count = @angles;
    my ($sum_sin, $sum_cos) = (0, 0);
    for my $angle (@angles) {
        $sum_sin += sin $angle;
        $sum_cos += cos $angle;
    }
    return rad2deg atan2 $sum_sin/$count, $sum_cos/$count;
}

is deg2rad(0), 0, "To rad: 0 degree";
is deg2rad(90), PI/2, "To rad: 90 degrees";
is deg2rad(180), PI, "To rad: 180 degrees";
is rad2deg(PI/2), 90, "To degrees: 90 degrees";
is rad2deg(PI), 180, "To degrees: 180 degrees";
is deg2rad(rad2deg(PI)), PI, "Roundtrip rad -> deg -> rad";
is rad2deg(deg2rad(90)), 90, "Roundtrip deg -> rad -> deg";
is mean(10, 20, 30), 20, "Mean of 10, 20, 30 degrees";
is mean(355, 5, 15), 5, "Mean of 355, 5, 15 degrees";

Running the tests displays the following:

$ perl angle-mean.pl
1..9
ok 1 - To rad: 0 degree
ok 2 - To rad: 90 degrees
ok 3 - To rad: 180 degrees
ok 4 - To degrees: 90 degrees
ok 5 - To degrees: 180 degrees
ok 6 - Roundtrip rad -> deg -> rad
ok 7 - Roundtrip deg -> rad -> deg
ok 8 - Mean of 10, 20, 30 degrees
ok 9 - Mean of 355, 5, 15 degrees

Update: As pointed out in a comment by Saif below, there is no need to divide both arguments of the atan2 built-in function: these arguments represent the abscissa and the ordinate of a point in the plan. Whether the two Cartesian coordinates are divided by count or not does not change the resulting polar angle calculated by atan2. Thus, we don't need to perform this division, and we don't even need the $count variable. The mean subroutine can be simplified as follows:

sub mean {
    my @angles = map { deg2rad $_ } @_;
    my ($sum_sin, $sum_cos) = (0, 0);
    for my $angle (@angles) {
        $sum_sin += sin $angle;
        $sum_cos += cos $angle;
    }
    return rad2deg atan2 $sum_sin, $sum_cos;
}

The tests display the same results as before.

End update.

Mean Angles in Perl 6

We will use essentially the same idea as in P5.

use v6;
use Test;

sub deg2rad (Numeric $deg) { return $deg * pi /180; }
sub rad2deg (Numeric $rad) { return $rad * 180 / pi }

sub mean (*@degrees) {
    my @radians = map { deg2rad $_ }, @degrees;
    my $count = @radians.elems;
    my $avg-sin = ([+] @radians.map( {sin $_})) / $count; 
    my $avg-cos = ([+] @radians.map( {cos $_})) / $count; 
    return rad2deg atan2 $avg-sin, $avg-cos;
}
plan 9;
is deg2rad(0), 0, "To rad: 0 degree";
is deg2rad(90), pi/2, "To rad: 90 degrees";
is deg2rad(180), pi, "To rad: 180 degrees";
is rad2deg(pi/2), 90, "To degrees: 90 degrees";
is rad2deg(pi), 180, "To degrees: 180 degrees";
is deg2rad(rad2deg(pi)), pi, "Roundtrip rad -> deg -> rad";
is rad2deg(deg2rad(90)), 90, "Roundtrip deg -> rad -> deg";
is-approx mean(10, 20, 30), 20, "Mean of 10, 20, 30 degrees";
is-approx mean(355, 5, 15), 5, "Mean of 355, 5, 15 degrees";

And this is the output produced when running the script:

perl6  angle-mean.p6
1..9
ok 1 - To rad: 0 degree
ok 2 - To rad: 90 degrees
ok 3 - To rad: 180 degrees
ok 4 - To degrees: 90 degrees
ok 5 - To degrees: 180 degrees
ok 6 - Roundtrip rad -> deg -> rad
ok 7 - Roundtrip deg -> rad -> deg
ok 8 - Mean of 10, 20, 30 degrees
ok 9 - Mean of 355, 5, 15 degrees

Note that I had to use the is-approx function of the Test module (instead of the simple is function) for tests computing the mean because I would otherwise get failed tests due to rounding issues:

# Failed test 'Mean of 10, 20, 30 degrees'
# at angle-mean.p6 line 22
# expected: '20'
#      got: '19.999999999999996'
not ok 9 - Mean of 355, 5, 15 degrees

As you can see, the program computes 19.999999999999996, where I expect 20, which is nearly the same numeric value.

I actually expected similar problems with Perl 5, but, for some reason, it did not occur. Perhaps the P5 Test::More module has a built-in approximate numeric comparison that silently takes care of such problems.

Update: as note above in the P5 section of this task following Saif's comment, we don't really need to divide the arguments of the atan2 built-in function by the number of angles. The mean subroutine can be simplified as follows:

sub mean (*@degrees) {
    my @radians = map { deg2rad $_ }, @degrees;
    my $sum-sin = [+] @radians.map( {sin $_}); 
    my $sum-cos = [+] @radians.map( {cos $_}); 
    return rad2deg atan2 $sum-sin, $sum-cos;
}

The tests display the same results as before.

End update.

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, September, 29. And, please, also spread the word about the Perl Weekly Challenge if you can.

Perl Weekly Challenge 25: Pokémon Sequence and Chaocipher

These are some answers to the Week 25 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in several days from now (September 15, 2019). This blog post offers some solutions to this challenge, please don't read on if you intend to complete the challenge on your own.

Challenge # 1: Sequence of Pokémon Names

Generate a longest sequence of the following English Pokemon names where each name starts with the last letter of previous name.ù

audino bagon baltoy banette bidoof braviary bronzor carracosta charmeleon cresselia croagunk darmanitan deino emboar emolga exeggcute gabite girafarig gulpin haxorus heatmor heatran ivysaur jellicent jumpluff kangaskhan kricketune landorus ledyba loudred lumineon lunatone machamp magnezone mamoswine nosepass petilil pidgeotto pikachu pinsir poliwrath poochyena porygon2 porygonz registeel relicanth remoraid rufflet sableye scolipede scrafty seaking sealeo silcoon simisear snivy snorlax spoink starly tirtouga trapinch treecko tyrogue vigoroth vulpix wailord wartortle whismur wingull yamask

First, an assumption: each name in the sequence must appear only once, because if there could be duplicates, then it wouldn't be difficult to find an infinite cyclical sequence and easily win the prize for the longest sequence. Therefore, when we use a name at some point in a sequence, it should be removed from the list of names authorized afterwards in the same sequence. We also assume that the longest sequence means the sequence with the largest number of names (not the largest number of letters). One comment, finally: one of the Pokémons is named "porygon2"; since no name starts with a digit, this name cannot be used within a sequence, but at best as the final item of a sequence.

Longest Sequence of Pokémons in Perl 5

The first version of my program did not handle the case where there are several sequences, but it still printed the largest sequence count each time it was updated. And it appeared immediately that there were many sequences (1248) with the highest count (23 names). So I changed the code to record all the sequences with the highest count.

The first thing that the program does is to populate a hash with arrays of words starting with the same letter (that letter being the key in the hash). This way, when we look for a successor in a sequence, we only look at names stating with the right letter. The program also maintains a $seen hash reference to filter out names that have already been used in a sequence.

The program is using brute force, i.e. trying every legal sequence. Each time we've found a sequence that can no longer be augmented, we need to backtrack. The easiest way to implement a backtracking algorithm is to use recursion. So, our search_seq calls itself recursively each time we want to add a new name to a sequence.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

my @names = qw /audino bagon baltoy banette bidoof braviary bronzor 
                carracosta charmeleon cresselia croagunk darmanitan 
                deino emboar emolga exeggcute gabite girafarig 
                gulpin haxorus heatmor heatran ivysaur jellicent 
                jumpluff kangaskhan kricketune landorus ledyba 
                loudred lumineon lunatone machamp magnezone 
                mamoswine nosepass petilil pidgeotto pikachu pinsir 
                poliwrath poochyena porygon2 porygonz registeel 
                relicanth remoraid rufflet sableye scolipede scrafty 
                seaking sealeo silcoon simisear snivy snorlax spoink
                starly tirtouga trapinch treecko tyrogue vigoroth 
                vulpix wailord wartortle whismur wingull yamask/;

my %name_by_letter;
for my $name (@names) {
    my $start_letter = substr $name, 0, 1;
    push @{$name_by_letter{$start_letter}}, $name;
}

my @best_seq;
my $best_count = 0;
for my $name (@names) {
    search_seq( [$name], {$name => 1} );
}
say "BEST SEQUENCES: ";
for my $item (@best_seq) {
   say "@$item";
}
say "Number of sequences: ", scalar @best_seq;

sub search_seq {
    my ($current_seq, $seen) = @_;
    my $last_name = $current_seq->[-1];
    my $last_letter = substr $last_name, -1, 1;
    my @next_candidates = grep { not exists $seen->{$_} }   
        @{$name_by_letter{$last_letter}};
    if (scalar @next_candidates == 0) {
        my $count = scalar @$current_seq;
        if ($count > $best_count) {
            @best_seq = ($current_seq);
            $best_count = $count;
        } elsif ($count == $best_count) {
            push @best_seq, $current_seq;
        }
    } else {
        for my $name (@next_candidates) {
            my %local_seen = %$seen;
            $local_seen{$name} = 1;
            search_seq ([@$current_seq, $name], {%local_seen});
        }
    }
}

As already mentioned, the best sequence count is 23 names, and the program detects 1248 sequences with that name count. So, I will provide only a few lines of the output:

machamp petilil landorus scrafty yamask kricketune emboar registeel loudred darmanitan nosepass simisear relicanth heatmor rufflet trapinch haxorus seaking girafarig gabite exeggcute emolga audino
machamp petilil landorus scrafty yamask kricketune emboar registeel loudred darmanitan nosepass simisear rufflet trapinch heatmor relicanth haxorus seaking girafarig gabite exeggcute emolga audino
machamp petilil landorus scrafty yamask kricketune emboar relicanth haxorus simisear rufflet trapinch heatmor registeel loudred darmanitan nosepass seaking girafarig gabite exeggcute emolga audino
machamp petilil landorus scrafty yamask kricketune emboar relicanth heatmor registeel loudred darmanitan nosepass simisear rufflet trapinch haxorus seaking girafarig gabite exeggcute emolga audino
machamp petilil landorus scrafty yamask kricketune emboar relicanth heatmor rufflet trapinch haxorus simisear registeel loudred darmanitan nosepass seaking girafarig gabite exeggcute emolga audino
machamp petilil landorus scrafty yamask kricketune emboar rufflet trapinch haxorus simisear relicanth heatmor registeel loudred darmanitan nosepass seaking girafarig gabite exeggcute emolga audino

The program runs in about 35 seconds. This is sort of acceptable, but still a bit too long in my view. The main problem is that adding just a few dozen names is very likely to make the performance totally unacceptable. I can think about a few micro-optimizations, but I'm not really interested with these. And I don't see any simple way to significantly improve performance. Well, yes, there might be a possibility: each time we explore a name, we could keep track of the longest sequence it has generated, so that when we explore a new name and find the first name, we could use the longest sequence. But it's not very easy, because that works only if that longest sequence does not have any of the names previously used. Overall, I'm not really convinced we would be able to add full longest subsequences very often.

I tried to pass the last added name as a parameter to the search_seq subroutine to avoid having to look for it in the current sequence, but that doesn't run any faster (possibly even slightly slower); it seems that the overhead of the additional argument is larger than the cost of dereferencing the last item of the current sequence. I also tried to populate a hash giving directly a list of possible successors for each name in the list (to avoid having to check repeatedly the last letter of the last word added), but that does not bring any significant speed improvement.

Longest Sequence of Pokémons in Perl 6

We'll use basically the same algorithm in Perl 6. Using sets or similar collections (in fact sethashes), with the mathematical set operations, will make the syntax a bit easier.

use v6;

my @names = < audino bagon baltoy banette bidoof braviary bronzor  
                carracosta charmeleon cresselia croagunk darmanitan 
                deino emboar emolga exeggcute gabite girafarig 
                gulpin haxorus heatmor heatran ivysaur jellicent 
                jumpluff kangaskhan kricketune landorus ledyba 
                loudred lumineon lunatone machamp magnezone 
                mamoswine nosepass petilil pidgeotto pikachu pinsir 
                poliwrath poochyena porygon2 porygonz registeel 
                relicanth remoraid rufflet sableye scolipede scrafty 
                seaking sealeo silcoon simisear snivy snorlax spoink
                starly tirtouga trapinch treecko tyrogue vigoroth 
                vulpix wailord wartortle whismur wingull yamask >;

my %name-by-letter;
for @names -> $name {
    my $start-letter = substr $name, 0, 1;
    push %name-by-letter{$start-letter}, $name;
}

my @best-seq;
my $best-count = 0;
for @names -> $name {
    search-seq( [$name], $name.SetHash );
}
say "BEST SEQUENCES: ";
for @best-seq -> $item {
   say "$item";
}
say "Number of sequences: ", @best-seq.elems;
say now - INIT now;

sub search-seq (@current-seq, $seen) {
    my $last-name = @current-seq[*-1];
    my $last-letter = substr $last-name, *-1, 1;
    my @next-candidates = grep {defined $_}, # Remove empty slots
        (@(%name-by-letter{$last-letter}) (-) $seen).keys;
    if ( @next-candidates.elems == 0) {
        my $count = @current-seq.elems;
        if $count > $best-count {
            @best-seq = @current-seq;
            $best-count = $count;
        } elsif ($count == $best-count) {
            push @best-seq, @current-seq;
        }
    } else {
        for @next-candidates -> $name {
            my @new-seq = | @current-seq, $name;
            search-seq( @new-seq, $seen ∪ $name.SetHash );
        }
    }
}

Again, we copy only a small fraction of the output:

machamp petilil landorus seaking girafarig gabite exeggcute emboar rufflet trapinch heatmor registeel loudred darmanitan nosepass simisear relicanth haxorus scrafty yamask kricketune emolga audino
machamp petilil landorus seaking girafarig gabite exeggcute emboar rufflet trapinch haxorus simisear relicanth heatmor registeel loudred darmanitan nosepass scrafty yamask kricketune emolga audino
machamp petilil landorus seaking girafarig gabite exeggcute emboar rufflet trapinch haxorus simisear relicanth heatmor registeel loudred darmanitan nosepass snivy yamask kricketune emolga audino
machamp petilil landorus seaking girafarig gabite exeggcute emboar rufflet trapinch haxorus simisear relicanth heatmor registeel loudred darmanitan nosepass starly yamask kricketune emolga audino

So this works, but the Perl 6 program now runs in more than 8 minutes. I have to think harder about optimizations or preferably a better algorithm.

Update Sept. 11: In his comment below, Timo Paulssen suggested that the grep in this statement:

    my @next-candidates = grep {defined $_}, # Remove empty slots
        (@(%name-by-letter{$last-letter}) (-) $seen).keys;

is slowing down significantly the program. For some reason, the correction he suggested wasn't really successful (I probably did something wrong), but removing the grep by changing the statement to this:

    my @next-candidates = %name-by-letter{$last-letter} ??
        (@(%name-by-letter{$last-letter}) (-) $seen).keys !! ();

reduced the execution time to four and a half minutes. I don't understand why this simple grep is taking so much time (not far from half of the total time), but that's a very good improvement.

I also tried to populate a hash giving directly a list of possible successors for each name in the list (to avoid having to check repeatedly the last letter of the last word added), but that does not bring any significant speed improvement (a win of about ten seconds).

End update.

I'll still try to think about a better algorithm, if time permits, and come back if I find something of interest.

Challenge 2: Implementation of Chaocypher

Create script to implement Chaocipher. Please checkout wiki page for more information.

According to the linked Wikipedia page, the Chaocipher is a cipher method invented by John Francis Byrne in 1918 and described in his 1953 autobiographical Silent Years. He believed Chaocipher was simple, yet unbreakable. He offered cash rewards for anyone who could solve it. In May 2010, the Byrne family donated all Chaocipher-related papers and artifacts to the National Cryptologic Museum in Ft. Meade, Maryland, USA. This led to the disclosure of the Chaocipher algorithm in a paper entitled Chaocypher Revealed: the Algorithm (2010), by Moshe Rubin.

How the Chaocipher works

The Chaocipher system consists of two alphabets, with the "right" alphabet used for locating the plaintext letter while the other ("left") alphabet is used for reading the corresponding ciphertext letter. In other words, the basis of the method is a simple substitution. The novel idea in the Chaocipher algorithm, however, is that the two alphabets are slightly modified after each input plaintext letter is enciphered. This leads to nonlinear and highly diffused alphabets as encryption progresses.

Although Byrne had in mind a physical model with rotating wheels, we will follow Rubin's algorithmic explanation of the method and represent each of the two alphabets as a 26-character string consisting of a permutation of the standard alphabet, for example:

            +            *
LEFT (ct):  HXUCZVAMDSLKPEFJRIGTWOBNYQ 
RIGHT (pt): PTLNBQDEOYSFAVZKGJRIHWXUMC

The place marked with a + sign and a * sign are called by Byrne the zenith and nadir points and they correspond to the first and the fourteenth positions in the alphabet. They are important for the alphabet permutation that will be performed after each ciphering and deciphering step.

The right alphabet (bottom) is used for finding the plain text letter, while the left alphabet (top) is used for finding the corresponding cipher text letter.

To encipher the plaintext letter "A," we simply look for this letter in the right alphabet and take the corresponding letter ("P") in the left alphabet (ct and pt stand for cipher text and plain text).

Each time a letter has been encrypted (or decrypted), we proceed with permutations of the alphabets. To permute the left alphabet, we will:

  • Shift the whole alphabet cyclically, so that the letter just enciphered ("P") is moved to the zenith (first) position;

    LEFT (ct):  PEFJRIGTWOBNYQHXUCZVAMDSLK
    

    Remove temporarily the letter in the second position (or zenith + 1), "E" in our example, leaving a "hole" in this position:

    LEFT (ct):  P.FJRIGTWOBNYQHXUCZVAMDSLK
    

    Shift one position to the left all letters between the second position and the nadir position, leaving a hole in the nadir position:

    LEFT (ct):  PFJRIGTWOBNYQ.HXUCZVAMDSLK
    
  • And finally insert the letter that has been removed ("E") in the nadir position:

    LEFT (ct):  PFJRIGTWOBNYQEHXUCZVAMDSLK
    

    Permuting the right alphabet is a similar process, but with some small but important differences that I will not describe here: please refer to Rubin's document to find the details.

After the permutation of the right alphabet, the two alphabets look like this:

LEFT (ct):  PFJRIGTWOBNYQEHXUCZVAMDSLK
RIGHT (pt): VZGJRIHWXUMCPKTLNBQDEOYSFA

With these new alphabets, we are now ready to encrypt the second letter of the plain text. Then we permute again both alphabets and proceed with the third letter of the plain text. And so on.

Deciphering the cipher text is the same process, except of course that we need to locate the first letter of the cipher text in the left alphabet and pick up the corresponding letter in the right alphabet. Alphabet permutations then follow exactly the same rules as when enciphering the plain text.

The strength of the Chaocipher is that the encryption key (the two alphabets) is changed each time a letter of the input text is processed, and the way it is changed depends on the content of the input message. In effect, this is an advanced form of an autokey cipher that is very difficult to break.

Chaocipher Implementation in Perl 5

For our alphabets, we could use strings of characters, arrays of letters or even hashes. Operations on strings of characters are usually reasonably fast and efficient, so I settled for that. Since both alphabets need to be permuted at the same time, I decided to write only one subroutine (permute_alphabets) to permute both alphabets at the same time: at least, there is no risk to permute one and forget to permute the other. I included some tests based on Rubin's paper examples.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;
use Data::Dumper;

sub permute_alphabets {
    my ($left, $right, $pos) = @_;
    my $newleft = substr ($left, $pos) . substr $left, 0, $pos;
    $newleft = substr ($newleft, 0, 1) . substr ($newleft, 2, 12) 
               . substr ($newleft, 1, 1) . substr $newleft, 14;

    my $newright = substr ($right, $pos+1) . substr $right, 0, $pos+1;
    $newright = substr ($newright, 0, 2) . substr ($newright, 3, 11) 
                . substr ($newright, 2, 1) . substr $newright, 14;
    return ($newleft, $newright);
}

sub run_tests {
    use Test::More; # Minimal tests for providing an example
    plan tests => 4;
    my $left  = 'HXUCZVAMDSLKPEFJRIGTWOBNYQ';
    my $right = 'PTLNBQDEOYSFAVZKGJRIHWXUMC';
    my $position = index $right, 'A';
    my ($newleft, $newright) = permute_alphabets $left, $right, 
        $position;
    is $newleft, 'PFJRIGTWOBNYQEHXUCZVAMDSLK', 
        "Left alphabet: $newleft";
    is $newright, 'VZGJRIHWXUMCPKTLNBQDEOYSFA', 
        "Right alphabet: $newright";
    my $plaintext = "WELLDONEISBETTERTHANWELLSAID";
    my $ciphertext = encipher($plaintext, $left, $right);
    is $ciphertext, 'OAHQHCNYNXTSZJRRHJBYHQKSOUJY', 
        "Testing enciphering: $ciphertext";
    my $deciphered = decipher($ciphertext, $left, $right);
    is $deciphered, $plaintext, "Roundtrip: $deciphered";
}

sub encipher {
    my ($plaintext, $left, $right) = @_;
    my $ciphertext = "";
    my @letters = split //, $plaintext;
    for my $let (@letters) {
        my $position = index $right, $let;
        $ciphertext .= substr $left, $position, 1;
        ($left, $right) = permute_alphabets ($left, $right, 
            $position);
    }
    return $ciphertext;
}

sub decipher {
    my ($ciphertext, $left, $right) = @_;
    my $plaintext = "";
    my @letters = split //, $ciphertext;
    for my $let (@letters) {
        my $position = index $left, $let;
        $plaintext .= substr $right, $position, 1;
        ($left, $right) = permute_alphabets ($left, $right, 
            $position);
    }
    return $plaintext;
}

if (@ARGV == 0) {
    run_tests;
} else {
    die "Invalid number of arguments: we need 4 arguments.\n" 
        unless @ARGV == 4;
    my ($mode, $text, $left, $right) = @ARGV;
    if ($mode eq 'encipher') {
        say encipher($text, $left, $right);
    } elsif ($mode eq 'decipher') {
        say decipher($text, $left, $right);
    } else {
        die "Invalid mode: must be 'encipher' or 'decipher'.\n";
    }
}

We can either launch the program without any argument to run the tests, or pass four arguments (mode, text, left alphabet and right alphabet) to encipher or decipher the text.

This is an example of the output:

$ perl chaocipher.pl
1..4
ok 1 - Left alphabet: PFJRIGTWOBNYQEHXUCZVAMDSLK
ok 2 - Right alphabet: VZGJRIHWXUMCPKTLNBQDEOYSFA
ok 3 - Testing enciphering: OAHQHCNYNXTSZJRRHJBYHQKSOUJY
ok 4 - Roundtrip: WELLDONEISBETTERTHANWELLSAID

$ perl chaocipher.pl encipher WELLDONEISBETTERTHANWELLSAID HXUCZVAMDSLKPEFJRIGTWOBNYQ PTLNBQDEOYSFAVZKGJRIHWXUMC
OAHQHCNYNXTSZJRRHJBYHQKSOUJY

$ perl chaocipher.pl decipher OAHQHCNYNXTSZJRRHJBYHQKSOUJY HXUCZVAMDSLKPEFJRIGTWOBNYQ PTLNBQDEOYSFAVZKGJRIHWXUMC
WELLDONEISBETTERTHANWELLSAID

In a real life program, we would need to do a few other things to make it more robust, including especially a proper validation of the arguments (only upper case ASCII letters in the input text, complete alphabets, etc.). This is left as an exercise to the reader ).

Chaocipher Implementation in Perl 6

Besides the minor syntax differences between Perl 5 and Perl 6, there no reason to change the way the enciphering/deciphering algorithm operates. We will use multi MAIN subroutines to decide on whether to run tests or to process a string passed to the program. There are a couple of minor added features. We declare an uppercase subset of the string type to provide a better (still limited) validation of subroutine arguments. And we fold the case of the program arguments to what is needed.

use v6;
subset UcStr of Str where /^<[A..Z]>+$/;

sub permute-alphabets (UcStr $left is copy, UcStr $right is copy, UInt $pos) {
    $left = substr($left, $pos) ~ substr $left, 0, $pos;
    $left = substr($left, 0, 1) ~ substr($left, 2, 12) 
            ~ substr($left, 1, 1) ~ substr $left, 14;

    $right = substr($right, $pos+1) ~ substr $right, 0, $pos+1;
    $right = substr($right, 0, 2) ~ substr($right, 3, 11) 
             ~ substr($right, 2, 1) ~ substr $right, 14;
    return ($left, $right);
}

sub run_tests {
    use Test; 
    plan 4;
    my $left  = 'HXUCZVAMDSLKPEFJRIGTWOBNYQ';
    my $right = 'PTLNBQDEOYSFAVZKGJRIHWXUMC';
    my $position = index $right, 'A';
    my ($newleft, $newright) = permute-alphabets $left, $right,
        $position;
    is $newleft, 'PFJRIGTWOBNYQEHXUCZVAMDSLK', 
        "Left alphabet: $newleft";
    is $newright, 'VZGJRIHWXUMCPKTLNBQDEOYSFA', 
        "Right alphabet: $newright";
    my $plaintext = "WELLDONEISBETTERTHANWELLSAID";
    my $ciphertext = encipher($plaintext, $left, $right);
    is $ciphertext, 'OAHQHCNYNXTSZJRRHJBYHQKSOUJY', 
        "Testing enciphering: $ciphertext";
    my $deciphered = decipher($ciphertext, $left, $right);
    is $deciphered, $plaintext, "Roundtrip: $deciphered";
}

sub encipher (UcStr $plaintext, UcStr $left is copy, UcStr $right is copy) {
    my $ciphertext = "";
    for $plaintext.comb -> $let {
        my $position = index $right, $let;
        $ciphertext ~= substr $left, $position, 1;
        ($left, $right) = permute-alphabets $left, $right,
            $position;
    }
    return $ciphertext;
}

sub decipher (UcStr $ciphertext, UcStr $left is copy, UcStr $right is copy) {
    my $plaintext = "";
    for $ciphertext.comb -> $let {
        my $position = index $left, $let;
        $plaintext ~= substr $right, $position, 1;
        ($left, $right) = permute-alphabets $left, $right, 
            $position;
    }
    return $plaintext;
}

multi MAIN () {
    run_tests;
} 
multi MAIN (Str $mode, Str $text, Str $left, Str $right) {  
    if $mode.lc eq 'encipher' {
        say encipher $text.uc, $left.uc, $right.uc;
    } elsif $mode.lc eq 'decipher' {
        say decipher $text.uc, $left.uc, $right.uc;
    } else {
        die "Invalid mode $mode: must be 'encipher' or 'decipher'.\n";
    }
}

And this is a sample output with various arguments:

$ perl6 chaocipher.p6 encipher WELLDONEISBETTERTHANWELLSAID HXUCZVAMDSLKPEFJRIGTWOBNYQ PTLNBQDEOYSFAVZKGJRIHWXUMC
OAHQHCNYNXTSZJRRHJBYHQKSOUJY

$ perl6 chaocipher.p6  decipher OAHQHCNYNXTSZJRRHJBYHQKSOUJY HXUCZVAMDSLKPEFJRIGTWOBNYQ PTLNBQDEOYSFAVZKGJRIHWXUMC
WELLDONEISBETTERTHANWELLSAID

$ perl6 chaocipher.p6
1..4
ok 1 - Left alphabet: PFJRIGTWOBNYQEHXUCZVAMDSLK
ok 2 - Right alphabet: VZGJRIHWXUMCPKTLNBQDEOYSFA
ok 3 - Testing enciphering: OAHQHCNYNXTSZJRRHJBYHQKSOUJY
ok 4 - Roundtrip: WELLDONEISBETTERTHANWELLSAID

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, September, 22. And, please, also spread the word about the Perl Weekly Challenge if you can.

Perl Weekly Challenge # 24: Smallest Script and Inverted Index

These are some answers to the Week 24 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in several days from now (September 8 , 2019). This blog post offers some solutions to this challenge, please don't read on if you intend to complete the challenge on your own.

Challenge # 1: Smallest Script With No Execution Error

Create a smallest script in terms of size that on execution doesn’t throw any error. The script doesn’t have to do anything special. You could even come up with the smallest one-liner.

I was first puzzled by this strange specification. Can it be that we really want a script that does nothing? Does it have to be the shortest possible script.

Well, after reading again, yes, it seems so.

I'll go for one-liners.

My script in Perl 5:

$ perl -e ''

Just in case there is any doubt, we can check the return value under Bash to confirm that there was no error:

$ echo $?
0

And this is my script in Perl 6:

$ perl6 -e ''

Note that, in both Perl 5 and Perl 6, creating an empty file and using it as a parameter to the perl or perl6 command line would work just as well, for example:

$ perl6 my-empty-file.pl

And that's it for the first challenge. Boy, that was a quick one.

Inverted Index

Create a script to implement full text search functionality using Inverted Index. According to wikipedia:

In computer science, an inverted index (also referred to as a postings file or inverted file) is a database index storing a mapping from content, such as words or numbers, to its locations in a table, or in a document or a set of documents (named in contrast to a forward index, which maps from documents to content). The purpose of an inverted index is to allow fast full-text searches, at a cost of increased processing when a document is added to the database.

Inverted Index in Perl 5

I do not find the Wikipedia explanation to be very clear, but I'll implement the following: I have on my file system a directory containing about 500 Perl scripts (with a '.pl' extension). My program will read all these files (line by line), split the lines into words and keep only words containing only alphanumerical characters (to get rid of operators and variables names with sigils) and with a length of at least 3 such characters. These words will be used to populate a hash (actually a HoH), so that for each such word, I'll be able to directly look up the name of all the files where this word is used.

This is fairly simple:

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

my @files = glob "./*.pl";
my %dict;
for my $file (@files) {
    open my $IN, "<", $file or die "Cannot open $file $!";
    while (my $line = <$IN>) {
        my @words = grep { /^\w{3,}$/ } split /\s+/, $line;;
        $dict{$_}{$file} = 1 for @words;
    }
    close $IN;
}
print Dumper \%dict;

The output has a bit less than 20,000 lines, which read in part as follows:

'checkdir' => {
                './monitor_files.pl' => 1,
                './monitor_files2.pl' => 1
              },
'start' => {
             './solver.pl' => 1,
             './url_regex.pl' => 1,
             './teams.pl' => 1,
             './test_start.pl' => 1,
             './markov_analysis.pl' => 1
           },
'1000' => {
            './first.pl' => 1,
            './jam1.pl' => 1
          },
'Minimal' => {
               './vigenere.pl' => 1
             },
'last' => {
            './strong_primes.pl' => 1,
            './pm_1196078.pl' => 1,
            './bench_lazy_map.pl' => 1,
            './inter_pairs.pl' => 1,
            './ladder2.pl' => 1,
            './perfect.pl' => 1,
            './homophones.pl' => 1,
            './pairs.pl' => 1,
            (...)

It wouldn't be difficult to store the output into a text file (that can then be reloaded into a Perl script hash) or into a database, or to find some other way of making the data persistent, but I have little use for such an index and the challenge specification does not request anything of that type. So, I will not try to go further.

Inverted Index in Perl 6

We'll do the same thing in Perl 6, but with another directory containing about 350 Perl 6 programs (with ".p6" or ".pl6" extensions).

use v6;

my @files = grep { /\.p6$/ or /\.pl6$/ }, dir('.');
my %dict;
for @files -> $file {
    for $file.IO.lines.words.grep({/^ \w ** 3..* $/}) -> $word {
        %dict{$word}{$file} = True;
    }
}
.say for %dict{'given'}.keys;

The program duly prints out the list of files with the given keyword:

$ perl6 inverted-index.p6
mult_gram.p6
calc_grammar.pl6
calculator-exp.pl6
VMS_grammar.p6
ana2.p6
calc_grammar2.pl6
ArithmAction.pl6

[... lines omitted for brevity]

normalize_url.p6
calculator.p6
arithmetic.pl6
json_grammar_2.pl6
point2d.pl6
arithmetic2.pl6
forest.p6

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, September 15. And, please, also spread the word about the Perl Weekly Challenge if you can.

About laurent_r

user-pic I am the author of the "Think Perl 6" book (O'Reilly, 2017) and I blog about the Perl 5 and Raku programming languages.