Perl Weekly Challenge # 9: Square Numbers and Functional Programming in Perl

In this other blog post, I provided some answers to Week 9 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Here, I want to use the opportunity of this challenge to illustrate some possibilities of functional programming in Perl (both Perl 5 and Perl 6) using the example of the first challenge of this week..

Challenge: Square Number With At Least 5 Distinct Digits

Write a script that finds the first square number that has at least 5 distinct digits.

A Data Pipeline in Perl 5

One of the solutions I suggested in my above-mentionned blog post was this script:

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

for my $integer (100..1000) {
    my $square = $integer ** 2;
    my @digits = split //, $square;
    my %unique_digits = map {$_ => 1} @digits;
    if (scalar keys %unique_digits >= 5) {
        say "$integer -> $square";
        last;
    }
}

Although this is not the main discriminating feature of functional programming, one of the techniques commonly used in languages such as Lisp and its variants is data-flow programming or data pipeline: we take a list of data items and let them undergo a series of successive transformations to get to the desired result. The map function used above is an example of it: here it takes on its right-hand side a list of digits (the @digits array) as input and produces a list of pairs to populate the %unique_digits hash on the left-hand side. We can go further with this model and avoid these temporary variables.

The whole for loop above can be replaced by just three lines of code:

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

my @squares = map { $_ ** 2 } 100..200;
my @nums = grep { my %unique = map {$_ => 1} split //, $_; scalar keys %unique >= 5 ? 1 : 0} @squares;
say $nums[0];

The first line of real code should be read from right to left: we start with a range of integers (100..200), feed it to a map statement which produces the squares of these integers, and finally use the list thus generated to populate the @squares array. In a map statement, each value of the input list is aliased in turn to $_, so that the code block produces the squares of the input values.

The second line is a little bit more complicated. Basically, it takes the values of the @squares array as input and use the grep function to filter the squares that have 5 distinct digits. The grep code block builds a %unique hash for each number received as a parameter, and returns a true value for the input values that produce a hash with at least 5 items. Finally, values that are filtered are fed into the @num array. And the last line prints the first item of the @nums array, i.e. the first number having at least 5 distinct digits.

I must admit that this is probably not the best example to show the expressive power of data-flow processing. I could have built a simpler example for the purpose of a tutorial. But, on the other hand, it shows that I can do it with a real case imposed from outside.

From now on, we will drop the 4 boiler plate code lines at the script's beginning (the use ... lines) to avoid repetition in each code example, but they are of course necessary in any Perl 5 script (except possibly some simple one-liners).

Going one step further, the whole algorithm to find the first square number with 5 distinct digits can be rewritten as a single data pipeline:

say  +(grep { my %unique = map {$_ => 1} split //, $_; scalar keys %unique >= 5 ? 1 : 0} map { $_ ** 2 } 100..200)[0];

The one-liner solution presented in my other blog is essentially using the same techniques:

$ perl -E 'for (100..1000) { my %h = map {$_ => 1} split //, $_**2; say "$_ -> ", $_**2 and last if scalar %hash >= 5 }'
113 -> 12769

As mentioned in my original blog post, one slight problem with these implementations is that we don't really know in advance how large the range of successive integers needs to be. In that case, it is often better to use an infinite loop (for example while (1) { ... }) and to break out of it when we're done. Here, however, it seemed rather obvious to me that we would find a square with 5 distinct digits relatively quickly, so that for (100..1000) would certainly be a good enough approximation of an infinite range for our purpose.

Another possibility is to create an iterator. That's what we will do next.

Iterators, Closures and Anonymous Code References

Most programmers commonly use iterators, sometimes not knowing that it's called this way. For example, when you read a file line by line with a construct such as:

while (my $line = <$FH>) {
    # do something with $line
}

you're actually using an iterator.

An iterator is a function that returns values and keeps track of the last returned value to find out the next one. What we want here is a function that returns squares one by one, so that we don't need to compute values that are not needed. In our case, we would need a function that "remembers" the last integer it has used to generate the last square (or, alternatively, that remembers the next integer to use). For this, we could simply use a global variable, but that's considered bad practice. Rather, we will use a closure, i.e. a function that knows about the environment in which it was defined. For example, we could do something like this:

{
    my $num = 100;
    sub give_me_a_square {
        $num ++;
        return $num ** 2
    }
}
while (my $square = give_me_a_square()) {
    my %unique = map {$_ => 1} split //, $square;
    if (scalar keys %unique == 5) {
        say $square;
        last;
    }
}

Here, the give_me_a_square subroutine if defined within a block (the pair of curly braces) that creates a lexical scope within which the $num variable is also declared and initialized. Because of that, give_me_a_square "closes over" $num, it is a closure. When we call this subroutine, we are no longer within the scope where $num is defined, but the subroutine "remembers" about $num and about its current value.

Some people believe that closures have to be anonymous function, but this is not true: here, our give_me_a_square closure is a perfectly regular named subroutine. It is true, though, that closures are often anonymous code references, because the ability to pass around code references as an argument to another function or as a return value from a subroutine is part of their expressive power. So, a more canonical implementation of an iterator would use an anonymous code reference:

sub make_square_iterator {
    my $num = shift;
    return sub {
        $num++;
        return $num ** 2;
    }
}
my $square_iter = make_square_iterator 100;
while (my $square = $square_iter->()) {
    my %unique = map {$_ => 1} split //, $square;
    if (scalar keys %unique == 5) {
        say $square;
        last;
    }
}

The main advantage of this implementation over the previous one is that $num is no longer hard coded, but passed as an argument to the make_square_iterator subroutine, which means that we could call it several times with different initial values and generate as many iterators as we want, and each iterator would keep track of its own current value. Here, we need only one, and when make_square_iterator is called, it returns an anonymous subroutine or coderef which the caller stores in the $square_iter variable and calls each time it needs a new square.

The ability to create anonymous subroutines (as coderefs) dynamically is an essential part of Perl's expressive power.

To tell the truth, using an iterator for such a simple problem is a bit of an overkill, but I thought it constituted an interesting example to introduce this powerful technique.

Using a closure is the traditional way to create an iterator in Perl 5 since the beginning of Perl 5 in 1994. And this is what I commonly use at $work on some of our platforms where we are stuck with old versions of Perl. Version 5.10, however, introduced the state declarator which makes it possible to declare persistent private variables that are initialized only once (the first time the code line is executed). This feature needs to be activated, for example with a code line containing the use feature "state"; pragma. Using state variables makes the code of an iterator a bit simpler:

use feature qw/say state/;

sub provide_square {
    state $num = shift;
    return ++$num ** 2;
}
while (my $square = provide_square 100) {
    my %unique = map {$_ => 1} split //, $square;
    if (scalar keys %unique == 5) {
        say $square;
        last;
    }
}

To understand how this code works, remember that the state $num = shift; code line is executed only the first time the provide_square subroutine is called. On the following calls, $num is successively 101, 102, 103, etc.

Square Numbers in Perl 6

A data pipeline in functional style may look like this:

say first /\d+/, grep { 5 <= elems unique comb '', $_ }, map { $_ ** 2}, 100..*;

Note that first used as a functional subroutine needs a regex as a first argument. The /\d+/ isn't really useful for the algorithm, but is needed for first to work properly.

But we can use first with a grep-like syntax to make this more convenient:

say first { 5 <= elems unique comb '', $_ }, map { $_ ** 2}, 100..*;

The data pipeline may also use chained method invocations:

say (100..*).map(* ** 2).grep(*.comb.unique >= 5).first;

Perl 6 also has the ==> feed operator:

my $square = 100...* ==> map { $_ ** 2 } ==> grep(*.comb.unique >= 5)  ==> first /\d+/;
say $square;

or, probably better:

100...* ==> map { $_ ** 2 } ==> first(*.comb.unique >= 5)  ==> say();

There is also the <== leftward feed operator:

say()  <== first(*.comb.unique >= 5) <== map { $_ ** 2} <== 100..*;

We have no reason to try to build an iterator in Perl 6 as we did in Perl 5, since the lazy infinite list mechanism just offers what we need. But we can create an iterator if we want to. This is what it might look like using the state declarator:

sub provide-square (Int $in) {
    state $num = $in;
    return ++$num ** 2;
}
while my $square = provide-square 100 {
    if $square.comb.unique >= 5 {
        say $square;
        last;
    }
}

We could also create an iterator with a closure:

sub create-iter (Int $in) {
    my $num = $in;
    return sub {
        return ++$num ** 2;
    }
}
my &square-iter = create-iter 100;
while my $square = &square-iter() {
    if $square.comb.unique >= 5 {
        say $square;
        last;
    }
}

Acknowledgement

I originally learned about these techniques from Mark Jason Dominus's book, Higher Order Perl, probably the best CS book I've read in the last 15 years or so. The book is available for free on-line, but if you start reading it, you might very well end up buying a paper copy. At least, this is what happened to me, and I'm very happy to own a paper copy of it.

Wrapping up

The next week Perl Weekly Challenge is due to start very soon. If you're interested in participating in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 6 p.m. BST (British summer time) on next Sunday, June 2. And, please, also spread the word about the Perl Weekly Challenge if you can.

Leave a comment

About laurent_r

user-pic I am the author of the "Think Perl 6" book (O'Reilly, 2017) and I blog about the Perl 5 and Raku programming languages.