Perl Weekly Challenge: Week 3

These are some answers to the Week 3 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Challenge #1: 5-Smooth Numbers

Create a script to generate 5-smooth numbers, whose prime divisors are less or equal to 5. They are also called Hamming/Regular/Ugly numbers. For more information, please check this wikipedia page.

Regular or 5-smooth numbers (or Hamming numbers) are numbers whose prime divisors are only 2, 3, and 5, so that they evenly divide some powers of 30.

A Perl 5 Solution

Generating just some 5-smooth numbers is a trivial problem. For example, if you want 6 such numbers, you only need to generate the first six powers of 2 (or the first six powers of 3, or six powers of 5), as in this Perl one-liner:

$ perl -E 'say 2 ** $_ for 1..6;'
2
4
8
16
32
64

This is really too simple, so my guess is that, perhaps, what is wanted is maybe something like: generate a sequence of all 5-smooth numbers smaller than a given upper bound (say 100). Such a sequence is sometimes called a Hamming sequence. Or maybe that's not really the requirement, but let's

We could do it with a brute-force approach: check all integers between 1 and 100, perform a prime factor decomposition of each of them and check whether any of the prime factors is larger than 5. This would be rather inefficient, though, with a lot of useless computations. An alternative would be to generate a list of primes between 1 and the upper bound and to check for each number in the range whether it can be evenly divided by any of the primes larger than 5. In either case, we need to build a list of prime numbers.

Building a List of Prime Numbers

There are several fast CPAN modules for prime numbers calculation (e.g. https://metacpan.org/pod/Math::Prime::Util1, https://metacpan.org/pod/Math::Prime::XS, https://metacpan.org/pod/Math::Prime::FastSieve, etc.), but this being a coding challenge, using other libraries might be frowned upon. Let's build such a list of primes between 1 and 100 in pure Perl.

#!/usr/bin/perl
use strict;
use warnings;
use feature "say";
use constant largest_num => 100;

sub find_primes {
    my $num = 5;
    my @primes = (2, 3, 5);
    while (1) {
        $num += 2;     # check only odd numbers
        last if $num > largest_num;
        my $limit = int $num ** 0.5;
        my $num_is_prime = 1;
        for my $prime (@primes) {
            last if $prime > $limit;
            if ($num % $prime == 0) {
                # $num evenly divided by $prime, $num is not prime and exit the for loop
                $num_is_prime = 0;
                last;
            }
        }
        push @primes, $num if $num_is_prime; #  Found a new prime, add it to the array of primes
    }
    return @primes;
}
my @prime_numbers = find_primes;  
print "@prime_numbers \n";

This display the following list:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

If we need a larger list of prime numbers, we can just change the value of the largest_num constant.

The find_primes subroutine uses the basic naive algorithm, with just a few performance improvement. The most basic algorithm to list prime numbers is to go through the sequence of integers (the "prime candidates") and, for each of them, try to divide it by all integers smaller than it, from the smallest (2) to the largest. The search can be stopped as soon as we find a number that evenly divides the number being checked for primality. These are some possible performance improvements:

  • The first improvement is to verify primality only for odd numbers, because even numbers can be divided evenly by 2 and are therefore not prime (with the exception of 2 itself, which is even and prime and may therefore need to be treated as a special case).

  • The second improvement is that we can also use odd numbers for the divisors to be tried (since we are checking only odd numbers, a divisor cannot be even). These two changes reduce by a factor of close to 4 the number of even divisions to be performed (at least for a number that turns out to be prime).

  • A further improvement is that we can check divisors up to the square root of the prime candidate, because, since we have checked all smaller possible divisors.

  • Finally, the last improvement implemented above is that we construct the list of prime numbers as we go (in the @primes array). Rather than trying to evenly divide each prime candidates by each odd number below the limit (square root of the candidate), we only try even division by the prime numbers found this far (in the @primes array).

We can't do much more to improve the algorithm (well, it is possible to find some additional refinements, but the new performance gains will be quite small and relatively insignificant compared to those already achieved). It is possible however to use totally different algorithms that will be far more efficient, especially for large prime candidates, such as the Miller-Rabin algorithm that will be mentioned below in the Perl 6 section.

Building the Hamming Sequence

Now if is easy to go through all the integers between 1 and 100 and find out if they can be divided evenly by any of the primes larger than 5:

my @prime_numbers = grep $_ > 5, find_primes;
my @regulars;
for my $num (1 .. 100) {
    my $is_regular = 1;
    for my $prime (@prime_numbers) {
        last if $prime > $num;
        if ( $num % $prime == 0) {
            $is_regular = 0;
            last;
        }
    }
    push @regulars, $num if $is_regular;    
}
print "@regulars \n";

Thus the Hamming sequence starts as follows:

1 2 3 4 5 6 8 9 10 12 15 16 18 20 24 25 27 30 32 36 40 45 48 50 54 60 64 72 75 80 81 90 96 100

Well, after all, it seems that all this may be a little be too complicated for the beginner's challenge.

Another Approach: Producing Directly the Products of the Powers of 2, 3, and 5

Let's consider another approach: can we construct directly a list of numbers less than 100 that are products of powers of 2, 3, and 5?

The largest power of 2 will be 6 (2 ** 7 is 128), the largest power of 3 will be 4 and the largest power of 5 will be 2. We will get some numbers larger than 100, but we can filter them out afterwards. This might give something like this:

use strict; 
use warnings;
use constant limit => 100;
my @raw_hamming;
for my $pow2 (0..6) {
    for my $pow3 (0..4) {
        for my $pow5 (0..2) {
            push @raw_hamming, 2 ** $pow2 * 3 ** $pow3 * 5 ** $pow5;
        }
    }
}
my @hamming_sequence = sort { $a <=> $b } grep $_ <= limit, @raw_hamming;
print "@hamming_sequence \n";

And we get the same Hamming sequence as before:

1 2 3 4 5 6 8 9 10 12 15 16 18 20 24 25 27 30 32 36 40 45 48 50 54 60 64 72 75 80 81 90 96 100

In the code above, I've hard-coded the ranges for simplicity and convenience, but the range upper bounds can be computed as follows: for the powers of 2, the range upper bound is the binary logarithm of the limit. Similarly, for the powers of 3, the upper range is the logarithm to base 3 of the limit, and likewise for 5. And since Perl 5 only has natural logarithms, remember, for example, that the binary logarithm of 1000 is equal to log 1000 / log 2 and that, similarly, the logarithm to base 3 of 1000 if log 1000 / log 3. The next code snippet shows the full calculation.

This works properly and the code is much simpler than before and probably significantly more efficient, but we're still doing quite a lot of useless calculations. And this does not scale very well, because the number of useless calculations increases faster than the number of useful calculations when the upper limit grows higher. Let's try to cut down useless calculations. We can stop the various for loops as soon as an intermediary or final result becomes too large. This will also make it possible to remove hard-coded limits.

use strict;
use warnings;
use constant limit => 10000000;
my $log_limit = log limit;
my @unsorted_hamming;
my ($max2, $max3, $max5) = (int $log_limit/log 2, int $log_limit/log 3, int $log_limit/log 5);
for my $pow2 (0..$max2) {
    my $result_2 = 2 ** $pow2;
    last if $result_2 > limit;
    for my $pow3 (0..$max3) {
        my $result_2_3 = $result_2 * 3 ** $pow3;
        last if $result_2_3 > limit;
        for my $pow5 (0..$max5) {
            my $result_2_3_5 = $result_2_3 * 5 ** $pow5;
            last if $result_2_3_5 > limit;
            push @unsorted_hamming, $result_2_3_5;
        }
    }
}
my @hamming_sequence = sort { $a <=> $b } @unsorted_hamming;
print "@hamming_sequence \n";

And we get the same Hamming sequence as before:

1 2 3 4 5 6 8 9 10 12 15 16 18 20 24 25 27 30 32 36 40 45 48 50 54 60 64 72 75 80 81 90 96 100

The code is now a bit more complicated, but it does no longer perform any useless calculation and is now very efficient. Changing the limit constant to 10 million, I computed all the 5-smooth numbers below 10 million (there are 768 of them) in about .2 second on my 8-year old laptop:

$ perl hamming.pl
1 2 3 4 5 6 8 9 10 12 15 16 18 20 24 25 27 30 32 36 40 45 48 50 54 60 64 72 75 80 81 90 96 100 108 120 125 128 
(...)
9447840 9565938 9600000 9720000 9765625 9830400 9841500 9953280 10000000

real    0m0.215s
user    0m0.000s
sys     0m0.077s

I was disappointed, however, that the optimized version does not run faster than the initial one, despite that fact that it supposedly does less work. I guess the lesson is clear: don't try to (micro-) optimize something that runs fast enough anyway.

Challenge #1 in Perl6

If we want to use the first approach above, Perl 6 has some features that are worth mentioning.

First, there is a built-in is-prime subroutine, which implements the very fast Miller-Rabin algorithm for figuring out whether an integer is prime. The is-prime subroutine returns False if this integer is not a prime, and it returns True if the integer is a known prime or if it is likely to be a prime based on the probabilistic Miller-Rabin test. In other words, the Miller-Rabin test is probabilistic and it is possible (though very unlikely) that is-prime will return True for a number that is not prime. In fact, the probability of occurrence of such an event is so low that it is said to be much less likely to happen than having a cosmic ray hitting your CPU at the wrong moment and disrupting its function to the point of giving you the wrong answer.

So building the list of primes between 1 and 100 is just one line of code, shown here in the REPL:

> my @primes = grep {.is-prime}, 1..100
[2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97]

We don't even need to specify how many primes we want in out list: we can build a lazy infinite list of prime numbers:

> my $primes := grep {.is-prime}, $list;
(...)
> say $primes[4];   # Fifth prime number
11
> say $primes[999]; # Thousandth prime number
7919

Here, $primes is an infinite list or prime numbers. Quite obviously, the computer did not calculate and populate an infinite list of primes. It is a lazy list, which means that the program now knows how to calculate any element of the list, but it will actually do so only when required. This is great because we don't need to know in advance how many primes we really need: we just prepare a lazy infinite list, and the program will compute only the primes that are actually needed by the program.

Building the Hamming Sequence

We can translate in Perl 6 our original Perl 5 program to display a Hamming sequence:

my @prime_numbers = grep {.is-prime}, 5^..Inf;    # we need only primes strictly larger than 5
my @regulars;
for (1 .. 100) -> $num {
    my $is_regular = True;
    for @prime_numbers -> $prime {
        last if $prime > $num;
        if ( $num %% $prime ) {
            $is_regular = False;
            last;
        }
    }
    push @regulars, $num if $is_regular;    
}
say @regulars;

This will print the same Hamming sequence as before:

[1 2 3 4 5 6 8 9 10 12 15 16 18 20 24 25 27 30 32 36 40 45 48 50 54 60 64 72 75 80 81 90 96 100]

Producing Directly the Products of the Powers of 2, 3, and 5

Rather than translating into Perl 6 our Perl 5 script, we will use the Perl 6 cross (X) metaoperator to generate all the products. I knew intuitively that this could certainly be done, but I must admit it took a little bit of thinking to figure out a good way to implement it.

The cross operator operates on two or more lists and generates a Cartesian product of all elements. Here is an example in the REPL:

> say <a b c> X <1 2 3>;
((a 1) (a 2) (a 3) (b 1) (b 2) (b 3) (c 1) (c 2) (c 3))

Used as a metaoperator, X will apply the associated operator to all the generated tuples. For, example, we can use it with the concatenation operator (X~) to generate strings from the tuples:

> say <a b c> X~ <1 2> X~ <y z>;
(a1y a1z a2y a2z b1y b1z b2y b2z c1y c1z c2y c2z)

We can use the cross metaoperator together with the multiplication operator (X*) to generate the products of the various powers of 2, 3 and 5:

my %powers;
for 2, 3, 5 -> $n {%powers{$n} = (1, $n, $n**2 ... *);} 
my @hamming_sequence = sort grep { $_ <= 100}, 
    (%powers{2}[0..6] X* %powers{3}[0..4] X* %powers{5}[0..2]);

First, we use the sequence (...) to generate infinite lists of the powers of 2, 3 and 5 (stored in the %powers hash, and, then, we use X* to generate all the products, and finally apply a grep to keep only the 5-smooth numbers smaller than 100 and sort the result.

We obtain the same Hamming sequence as before.

Challenge #2: Pascal's Triangle

Create a script that generates Pascal Triangle. Accept number of rows from the command line. The Pascal Triangle should have at least 3 rows. For more information about Pascal Triangle, check this wikipedia page.

The most typical way to construct Pascal's triangle is to deduct one line from the previous one. For example, the 3rd line is 1 2 1. We mentally add a 0 before and after the list and construct the 4th line by adding two by two the numbers of the third line:

0 1 2 1 0
 1 3 3 1

Next, we do it again with the fourth line just produced:

0 1 3 3 1 0
 1 4 6 4 1

You don't really need to mentally add the leading and trailing zeroes: you can just say that the coefficients are either the sum of the two coefficients above, and 1 when there is only one coefficient above. But adding a 0 before and after the list will be useful to avoid Use of uninitialized value warnings in some of the solutions below (although it could be done in several other ways).

Perl 5

Two Nested Loops

We can use simply two nested for loops like this:

use strict;
use warnings;
use feature "say";
my $nb_rows = shift;
my @row = (1);
for (1 .. $nb_rows) {
    say "@row";
    my @temp = (0, @row, 0);
    @row = ();
    for my $index (0 .. $#temp -1) {
        push @row, $temp[$index] + $temp[$index + 1];
    }
}

With an input parameter of 10 rows, this displays the following:

$ perl pasc.pl 10
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 35 35 21 7 1
1 8 28 56 70 56 28 8 1
1 9 36 84 126 126 84 36 9 1

This is not very complicated, but, as it is often the case with such loops, it is a little bit difficult to get the management of indices right. Also, I frankly don't like too much having to use a @temp array.

A Recursive Approach

Since each line is derived from the previous one, it seems it might be interesting to try a recursive approach:

use strict;
use warnings;
use feature "say";
sub pascal {
    my @row;
    my $line_count = shift;
    return unless $line_count;
    for my $index (0 .. $#_ - 1) {
        push @row, $_[$index] + $_[$index + 1];
    }
    say "@row";
    pascal ($line_count - 1, 0, @row, 0);
}
my $nb_rows = shift;
my @line = (1);
say "@line";
pascal ($nb_rows - 1, 0, @line, 0);

This works fine and produces the same output as before, but this is not really better.

Functional Programming Approach

Let's try to see whether we can do better with a functional programming oriented approach.

use strict;
use warnings;
use feature "say";
my $nb_rows = shift;
my @line = (1);
for my $row (1 .. $nb_rows) {
    say "@line";
    @line = (1, (map $line[$_] + $line[$_ + 1], 0 .. $row - 2), 1);
}

The code is now much shorter and, in my humble opinion, also clearer.

Perl 6

We can first translate into Perl 6 the last P5 solution (functional programming):

sub pascal ($nb-rows) {
    my @line = 1,;
    for 1 .. $nb-rows -> $row {
        @line.join(" ").say;
        @line = flat 1, (map {@line[$_] + @line[$_ + 1]}, 0 .. $row - 2), 1;
    }
}
sub MAIN (Int $rows where * > 0) {
    pascal $rows
}

Aside from a few small syntactic changes, the only significant difference with P5 is that we've used the MAIN subroutine signature to validate the input parameter, which obviously has to be a positive integer.

Lets try a recursive version:

sub pascal ($nb-rows, @line is copy) {
    return unless $nb-rows;
    @line.join(" ").say;
    @line = flat 1, (map {@line[$_] + @line[$_ + 1]}, 0 .. @line.elems - 2), 1;
    pascal $nb-rows - 1, @line;
}
sub MAIN (Int $rows where * > 0) {
    pascal $rows, (1,);
}

This recursive version is slightly more concise (one line less), but the gain is small.

Using the sequence operator

Since each line is generated from the previous one, we can use the ... sequence operator with a generator to produce a list of lines of Pascal's triangle:

sub pascal ($nb-rows) {
    my @lines = 1, -> $line { 
        flat 1, (map {$line[$_] + $line[$_ + 1]}, 0 .. $line.elems - 2), 1 
    } ... +$nb-rows;  
}
sub MAIN (Int $rows where * > 0) {
    my @triangle = pascal $rows;
    .join(" ").say for @triangle;
}

Note that we need to numify $nb-rows using the + prefix operator for the sequence to work properly.

Using the zip operator

We can do slightly better, however, using the Z zip metaoperator operator. To start exploring this, let's first consider any line of Pascal's triangle (for example 1 2 1) and try to build the next one. For now, we do it in the REPL:

> @a = 1, 2, 1;
[1 2 1]
> say (0, @a).flat Z (@a, 0).flat;
((0 1) (1 2) (2 1) (1 0))
> say 0, |@a Z |@a, 0;
((0 1) (1 2) (2 1) (1 0))
> @a = 0, |@a Z+ |@a, 0;
[1 3 3 1]
> @a = 0, |@a Z+ |@a, 0;
[1 4 6 4 1]

As you can see, combining [ 0, 1, 2, 1] and [ 1, 2, 1, 0] with the zip operator produces 4 pairs (((0 1) (1 2) (2 1) (1 0))) whose sums are the coefficient of the next line of Pascal's triangle. Note that we need to flatten the operands of the zip operator with the flat method call or with the | operator. When using the zip metaoperator with the addition operator we obtain the next line ([1 3 3 1]) of Pascal's triangle. And since we are storing the result in @a, doing it again will produce the following line, and so on.

We can rewrite the resursive version as follows:

sub pascal ($nb-rows, @line is copy) {
    return unless $nb-rows;
    @line.join(" ").say;
    @line =  0, |@line Z+ |@line, 0;
    pascal $nb-rows - 1, @line;
}
sub MAIN (Int $rows where * > 0) {
    pascal $rows, (1,);
}

Now, combining this with the previous improvement, we can use the ... sequence operator to produce a list of lines of Pascal's triangle.

> for 1, -> $line { [0, |$line Z+ |$line, 0] } ... 10 { .join(" ").say };
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 35 35 21 7 1
1 8 28 56 70 56 28 8 1
1 9 36 84 126 126 84 36 9 1

This is the new version (no longer recursive) of the pascal subroutine:

sub pascal ($nb-rows) {
    for 1, -> $line { [0, |$line Z+ |$line, 0] } ... $nb-rows { 
        .join(" ").say 
    }
}
sub MAIN (Int $rows where * > 0) {
    pascal +$rows;
}

The pascal subroutine is now so short that we no longer need to store its code in a separate subroutine.

sub MAIN (Int $rows where * > 0) {
    .join(" ").say for 1, -> $line { [0, |$line Z+ |$line, 0] } ... +$rows;
}

Finally, we could use an infinite (lazy) sequence and print the range that we need:

sub MAIN (Int $rows where * > 0) {
    my @lines = 1, -> $line { [0, |$line Z+ |$line, 0] } ... +$rows;
    join(" ").say for @lines[^$rows]
}

Wrapping up

The next week Perl Weekly Challenge is due to start very soon. If you're interested in participating in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 6 p.m. BST (British summer time) on next Sunday, April, 21. And, please, also spread the word about the Perl Weekly Challenge if you can.

About laurent_r

user-pic I am the author of the "Think Perl 6" book (O'Reilly, 2017) and I blog about the Perl 5 and Raku programming languages.