Perl Weekly Challenge # 14: Van Eck's Sequence and US States

These are some answers to the Week 14 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Challenge # 1: Van Eck's Sequence

Write a script to generate Van Eck’s sequence starts with 0. For more information, please check out wikipedia page. This challenge was proposed by team member Andrezgz.

Jan Ritsema van Eck's sequence is an integer sequence defined recursively as follows. Let a(0) = 0. Then, for n ≥ 0, if there exists an m < n such that a(m) = a(n), take the largest such m and set a(n+1) = n − m; otherwise a(n+1) = 0. Thus, the first occurrence of an integer in the sequence is followed by a 0, and the second and subsequent occurrences are followed by the size of the gap between the two most recent occurrences.

This time, I'll start with Perl 6 (but complete the challenge in both P5 and P6).

Van Eck's Sequence in Perl 6

The definition is quite simple, but, for some reason (maybe the heatwave striking a good part of Western Europe these days), it took me more time (about 30 minutes) than I expected to get it right. Anyway, here we go:

use v6;

my @a = 0,;
for ^20 -> $n {
    my $result = 0;
    for reverse ^$n -> $m {
        $result = $n - $m and last if @a[$m] == @a[$n];
            # Note: $m is always smaller than $n, so $n - $m > 0
    }
    push @a, $result;
}
say @a;

Not much to say about it: we just apply the definition, and the code is pretty small. This outputs the following sequence:

~ perl6 vaneck.p6
[0 0 1 0 2 0 2 2 1 6 0 5 0 2 6 5 4 0 5 3 0]

Since we have nested loops, I have been thinking whether it might be possible to store the values in a hash or a set, or use a junction, to avoid performing the inner loop when possible.

But looking at a longer version of the sequence, for example:

[0 0 1 0 2 0 2 2 1 6 0 5 0 2 6 5 4 0 5 3 0 3 2 9 0 4 9 3 6 14 0 6 3 5 15 0 5 3 5 2 17 0 6 11 0 3 8 0 3 3 1 42 0 5 15 20 0 4 32 0 3 11 18 0 4 7 0 3 7 3 2 31 0 6 31 3 6 3 2 8 33]

it appears that, except at the very beginning of the sequence, zeros are relatively rare, meaning that there are only few cases where we will go through the whole list and not find an $m within the inner loop. Therefore, there is probably no or little point trying to optimize away the inner loop by a lookup mechanism: we will not get a very significant performance improvement.

Maybe a solution using the gather/take construct to produce lazy lists would be more idiomatic Perl 6 code. Let's see what we can do.

use v6;

sub MAIN ( UInt $max ) {
    my @a = lazy gather {
        take 0;
        for 0..* -> $n {
            my $result = 0;
            for reverse ^$n -> $m {
                $result = $n - $m and last if @a[$m] == @a[$n];
            }
            take $result;
        }
    }
    say join " ", @a[0..$max];
}

It works and produces the same output as before, and using a lazy infinite list may look slightly more Perl-6-ish, but it does not make the code simpler, quite to the contrary. So, ultimately, I tend to prefer my first solution.

Van Eck's Sequence in Perl 5

Translating the first P6 version into P5 is very easy:

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

my $max = shift;
my @a = (0);
for my $n (0..$max - 1) {
    my $result = 0;
    for my $m (reverse 0..$n-1){
        $result = $n - $m and last if $a[$m] == $a[$n];
    }
    push @a, $result;
}
say "@a";

And this outputs the same as the P6 script:

    $ perl vaneck.pl 20
    0 0 1 0 2 0 2 2 1 6 0 5 0 2 6 5 4 0 5 3 0

The Longest English Word from 2-Letter US States Abbreviations

Using only the official postal (2-letter) abbreviations for the 50 U.S. states, write a script to find the longest English word you can spell? Here is the list of U.S. states abbreviations as per wikipedia page. This challenge was proposed by team member Neil Bowers.

The examples provided clarify the requirement:

Pennsylvania + Connecticut = PACT
Wisconsin + North Dakota = WIND
Maine + Alabama = MEAL
California + Louisiana + Massachusetts + Rhode Island = Calamari

We don't know, however, whether a State abbreviation can be used more than once. We'll assume it's possible, but there would probably very little to change to prohibit reuse of abbreviations if needed.

The list of 50 US States abbreviations used by the United States Postal Service taken from the above-mentioned Wikipedia page is as follows:

AL AK AZ AR CA CO CT DE DC FL GA HI ID IL IN IA KS KY LA ME MD MA MI MN MS MO MT NE NV NH NJ NM NY NC ND OH OK OR PA RI SC SD TN TX UT VT VA WA WV WI WY

My understanding is that the District of Columbia (DC) is not really a State (otherwise, there would be 51 States), so we will remove it from the list.

Just as for some previous challenges, I will use a words.txt file containing 113,809 lower-case English words usually accepted for crossword puzzles and other word games. The words.txt file can be found on my Github repository. The original list was contributed to the public domain by Internet activist Grady Ward in the context of the Moby Project. This word list is also mirrored at Project Gutenberg.

For the purpose of testing the programs below, the words.txt file is located in my current directory.

The word.txt input file contains words with only lowercase alphabetical ASCII characters. We'll probably need to turn them to upper case at some point (or to turn the State abbreviations to lower case).

Since all State abbreviations are 2-letter codes, we'll consider only words with an even number of letters. There are probably several ways to fulfill the requirement, but the simplest seems to break each word of our list into strings of two letters and to check whether each such string belongs to the State codes.

The Longest English Word in Perl 6

We first populate a Set with all the 50 State codes. Then we go through the word.txt file and, for each word with an even letter count, we break the word into pairs of characters and check whether each pair belongs to the set of State codes. If such is the case, we store the word in the $longest-word variable if it is longer than the previous value of $longest-word.

use v6;

my $codes = set < 
    AL AK AZ AR CA CO CT DE FL GA 
    HI ID IL IN IA KS KY LA ME MD 
    MA MI MN MS MO MT NE NV NH NJ 
    NM NY NC ND OH OK OR PA RI SC 
    SD TN TX UT VT VA WA WV WI WY >;

sub valid (Str $word) {
    for $word.uc.comb(2) -> $letter-pair {
        return False unless $letter-pair (elem) $codes;
    }
    return True;
}   

my $longest-word = "";
my $max-size = 0;
for "words.txt".IO.lines -> $word {
    next unless $word.chars %% 2;
    $longest-word = $word and $max-size = $word.chars 
        if valid $word and $word.chars > $max-size;
}
say $longest-word;

The longest word displayed on the screen when running the program is armorial, an eight-letter word.

Note that the $max-size variable isn't really needed, it is used only to cache the length of the current $longest-word in order to avoid recalculating it again and again each time through the loop and hopefully make the calculation a bit faster.

Adding a code line to print out intermediate results shows that my word list has in fact 13 eight-letter words made of US States postal codes:

armorial
calamine
gamodeme
ganymede
lavalava
mainland
malarial
mandarin
melamine
memorial
moorland
scincoid
utilidor

Also note that, since the program keeps track of the maximum size so far, it would be possible to improve performance by skipping words whose length is less than the longest candidate so far. But the script runs very fast (less than half a second), so I did not really bother to introduce this refinement.

Assuming we want to do it, the main loop of the above script would be rewritten as follows:

my $longest-word = "";
my $max-size = 0;
for "words.txt".IO.lines -> $word {
    my $length = $word.chars;
    next if $length % 2 or $length <= $max-size;
    $longest-word = $word and $max-size = $length 
        if valid $word;
}
say $longest-word;

With this change, the timing gets down to less than 0.3 second.

The Longest English Word in Perl 5

Perl 5 does not have sets, but we can use a hash instead. Also, to split the words of words.txt into two-letter strings, I use a regex. Otherwise, translating the code into Perl 5 is fairly straight forward:

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

my %codes = map { $_ => 1 } qw /
    AL AK AZ AR CA CO CT DE FL GA 
    HI ID IL IN IA KS KY LA ME MD 
    MA MI MN MS MO MT NE NV NH NJ 
    NM NY NC ND OH OK OR PA RI SC 
    SD TN TX UT VT VA WA WV WI WY /;

sub valid {
    my $word = uc shift;
    for my $letter_pair ( $word =~ /\w\w/g ) {
        return 0 unless defined $codes{$letter_pair};
    }
    return 1;
}

my $longest_word = "";
my $max_size = 0;
my $dict = "words.txt";
open my $IN, "<", $dict or die "Unable to open $dict $!";
while (my $word = <$IN>) {
    $word =~ s/[\n\r]+//g;
    next if length($word) % 2;  #skip if odd length
    $longest_word = $word and $max_size = length $word 
        if valid $word and length $word > $max_size;
}
say $longest_word;

And the printed result is again armorial.

The P5 script is a bit longer than the P6 script, but that's essentially because of some simple boiler-plate code needed in P5 (e.g. to open and read a file, etc.). I'm happy that I don't have to do that in P6, but that's not where P6 really shines brighter than P5.

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, July 7. And, please, also spread the word about the Perl Weekly Challenge if you can.

Leave a comment

About laurent_r

user-pic I am the author of the "Think Perl 6" book (O'Reilly, 2017) and I blog about the Perl 5 and Raku programming languages.