Perl Weekly Challenge 54: k-th Permutation Sequence and the Collatz Conjecture

By laurent_r on April 4, 2020 6:33 PM

These are some answers to the Week 54 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in a couple of days (April 5, 2020). This blog post offers some solutions to this challenge, please don’t read on if you intend to complete the challenge on your own.

Task 1: kth Permutation Sequence

Write a script to accept two integers n (>=1) and k (>=1). It should print the kth permutation of n integers. For more information, please follow the wiki page.

For example, n=3 and k=4, the possible permutation sequences are listed below:

The script should print the 4th permutation sequence 231.

It took me some questioning to figure out the requirement. My understanding is that the program should first generate a list of integers between 1 and n and then look for permutations in ascending order, and finally display the kth permutation.

kth Permutation in Perl

We write a recursive permute subroutine that generates permutations in the desired ascending order. Then we can just stop recursion once it has been called k times (thus avoiding to calculate all permutations when no needed).

use strict;
use warnings;
use feature "say";

my ($n, $k) = @ARGV;
my $err_msg = "Please supply two integer parameters freater than 0\n";
die $err_msg unless $n and $k;
die $err_msg if $n !~ /^\d{1,2}$/ or $k !~ /^\d+$/;
my @start = 1..$n;
permute("", @start);

sub permute {
    my ($str, @vals) = @_;
    if (scalar @vals == 0) {
        say $str and exit unless --$k; 
        return;
    }
    permute("$str " . $vals[$_], @vals[0..$_-1], @vals[$_+1..$#vals]) for 0..$#vals;
}

With the parameters n=3 and k=4, the program displays the following output:

$ perl permute.pl 3 4
 2 3 1

Note that I have decided to insert a space between the individual digits, as it makes it easier to visualize the individual values of the output when n is greater than 9 (and thus has more than one digit). For example, for the 350,000th permutation of the 1-35 range:

$ time perl permute.pl 35 350000
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 35 32 30 27 33 29 28 34 31

real    0m0,928s
user    0m0,890s
sys     0m0,030s

This is fairly fast: we’ve just computed the first 350,000 permutations of 35 items in less than one second. In case you need more speed with larger input values, you might try modules like Algorithm::Permute or ntheory. They are likely to be significantly faster. But I did not feel it was needed here.

kth Permutation in Raku

Raku has a built-in method, permutations that returns all possible permutations of a list as a Seq of lists. In addition, if the input list is in ascending order, the output permutation will also be in ascending order.

For example, with an input list of 1, 2, 3, the fourth permutation is:

perl6 -e 'say (1..3).permutations[3];'
(2 3 1)

In addition, although the documentation doesn’t state it explicitly, it appears that the permutations method acts lazily, i.e. it only generates the permutations needed for computing the desired final result. For example, the following one-liner computes the result (the 4th permutation) almost immediately (in less than one hundredth of a second):

$ perl6 -e 'say (1..20).permutations[3]; say now - INIT now;'
(1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 19 20 18)
0.0089752

which would obviously not be the case if it had to compute every permutation of a 20-item list before finding the fourth one (for a 20-item list, the number of permutations is 20!, i.e. 2432902008176640000, or about 2.4 billions of billions).

So we can write a one-liner script that accepts two integers i and k and prints the kth permutation of n integers as per the requirement:

$ perl6 -e 'say (1..@*ARGS[0]).permutations[@*ARGS[1]-1];' 3 4
(2 3 1)

If you prefer a stand-alone script, we can write this:

use v6;

sub MAIN (Int $n where * > 0, Int $k where * > 0) {
    (1..$n).permutations[$k - 1].say;
}

This outputs the same result as before:

$ ./perl6 permute.p6 3 4
(2 3 1)

Task 2: the Collatz Conjecture

It is thought that the following sequence will always reach 1:

$n = $n / 2 when $n is even
$n = 3*$n + 1 when $n is odd

For example, if we start at 23, we get the following sequence:

23 → 70 → 35 → 106 → 53 → 160 → 80 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1

Write a function that finds the Collatz sequence for any positive integer. Notice how the sequence itself may go far above the original starting number.

Extra Credit: have your script calculate the sequence length for all starting numbers up to 1000000 (1e6), and output the starting number and sequence length for the longest 20 sequences.

The Collatz conjecture concerns a sequence defined as follows: start with any positive integer n. Then each term is obtained from the previous term as follows: if the previous term is even, the next term is one half of the previous term. If the previous term is odd, the next term is 3 times the previous term plus 1. The conjecture is that, no matter what value of n, the sequence will always reach 1. This conjecture is named after Lothar Collatz who introduced it in 1937. It is sometimes known as the Syracuse problem (and some other names). It is usually believed to be true (and no counter-example has been found), but, despite a lot of efforts, nobody has been able to prove it, and this is deemed to be a very difficult problem.

The Collatz Conjecture in Perl

The Basic Task

For the purpose of the basic task, this is fairly straight forward. Here, we write a next_collatz subroutine that, given an integer computes the next number in the Collatz sequence. And we call that subroutine in a loop until we reach 1:

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

sub next_collatz { 
    my $num = shift;
    $num % 2 ? 3 * $num + 1 : $num / 2;
}

my $n = shift;
my @result = ($n);
while (1) {
    $n = next_collatz $n;
    push @result, $n;
    last if $n == 1;
}
say "@result";

These are some example outputs:

$ perl collatz.pl 23
23 70 35 106 53 160 80 40 20 10 5 16 8 4 2 1

$ perl collatz.pl 24
24 12 6 3 10 5 16 8 4 2 1

$ perl collatz.pl 25
25 76 38 19 58 29 88 44 22 11 34 17 52 26 13 40 20 10 5 16 8 4 2 1

$ perl collatz.pl 26
26 13 40 20 10 5 16 8 4 2 1

$ perl collatz.pl 27
27 82 41 124 62 31 94 47 142 71 214 107 322 161 484 242 
121 364 182 91 274 137 412 206 103 310 155 466 233 700 
350 175 526 263 790 395 1186 593 1780 890 445 1336 668 
334 167 502 251 754 377 1132 566 283 850 425 1276 638 
319 958 479 1438 719 2158 1079 3238 1619 4858 2429 
7288 3644 1822 911 2734 1367 4102 2051 6154 3077 9232 
4616 2308 1154 577 1732 866 433 1300 650 325 976 488 
244 122 61 184 92 46 23 70 35 106 53 160 80 40 20 
10 5 16 8 4 2 1

(The latest example just above was slightly reformatted for the purpose of this blog post.)

Extra Credit: Collatz Sequence for all Numbers up to 1,000,000

In theory, it wouldn’t be very complicated to encapsulate the above program into an outer loop to compute the Collatz sequence for any range of numbers. Except that going all the way up to 1,000,000 is probably going to take ages. One of the reason is that we’re going to recompute Collatz sequence successors for the same number again and again many times. If you look at the above examples, the sequences all end up with the following sequence: 16 8 4 2 1. So, it might be useful, when we reach 16 for the first time, to compute the end of the sequence only once, and to store it in a hash of arrays (or possibly an array of arrays), in order to retrieve it straight from the hash when we reach 16 once more. Similarly, the sequence for 25 end with 40 20 10 5 16 8 4 2 1. If we store this sequence somewhere, then we don’t have to compute it once more when we reach 40 while computing the Collatz sequence for 27, and, or course, also when we compute the Collatz sequence for 40, 80, 160, as well as 13, 26, 52, etc. Such a strategy is called caching or memoizing: storing in memory the result of a computation that we’re likely to have to compute again. It is sometimes described as “trading memory for time.”

There is a core module, called Memoize, written my Mark Jason Dominus, that is very easy to use can do the caching automatically for you. The problem though is that it wouldn’t be very practical to use it here, because we don’t want to cache just the next item in the sequence, but all the rest of the sequence down to 1. So it might be better to implement a cache ourselves, manually (that’s not very difficult, as we shall see).

There is another problem though, which is much more delicate. Since the requirement is to compute the Collatz sequence for all integers up to 1,000,000, the cache will grow very large (several millions of sequences) and we might run out of memory. In the first version of the program below, I tried to store all sequences up to one million, and the program turned out to be painfully slow. Looking at the system statistics, I found that, after a while, available memory became exhausted and the system would swap memory on the disk, leading to very slow execution. I made a couple of tests, and found that I could store the sequences for all numbers up to about 300,000 without exceeding the available memory of my computer (that number might be different on your computer), thus preventing the process from swapping and getting more or less the best possible performance, hence the MAX constant set to 300,000. Since I knew from earlier tests that the 20 longest sequences would all have more than 400 items, I also hard-coded a lower limit of 400 items for the sequences whose length had to be recorded. Another possibly better solution might have been to maintain a sliding array of the top 20 sequences, but I feared that maintaining this array many times over the execution of the program would end up impairing performance.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;
use Data::Dumper;
use constant MAX => 300000;

my %cache;

sub collatz_seq {
    my $input = shift;
    my $n = $input;
    my @result;
    while ($n != 1) {
        if (exists $cache{$n}) {
            push @result, @{$cache{$n}};
            last;
        } else {
            my $new_n = $n % 2 ? 3 * $n + 1 : $n / 2;
            push @result, $new_n;
            $cache{$n} = [$new_n, @{$cache{$new_n}}] 
                if defined ($cache{$new_n}) and $n < MAX;
            $n = $new_n;
        }
    }
    $cache{$input} = [@result] if $n < MAX;
    return @result;
}

my @long_seqs;
for my $num (1..1000000) {
    my @seq = ($num, collatz_seq $num);
    push @long_seqs, [ $num, scalar @seq] if scalar @seq > 400;
}

@long_seqs = sort { $b->[1] <=> $a->[1]} @long_seqs;
say  "$_->[0]: $_->[1]" for @long_seqs[0..19];
# say "@{$cache{$long_seqs[0][0]}}";

With these optimizations, I was able to reduce execution time to 1 min 7 sec.:

$ time perl collatz.pl
837799: 525
626331: 509
939497: 507
704623: 504
910107: 476
927003: 476
511935: 470
767903: 468
796095: 468
970599: 458
546681: 452
818943: 450
820022: 450
820023: 450
410011: 449
615017: 447
886953: 445
906175: 445
922524: 445
922525: 445

real    1m7,469s
user    1m6,015s
sys     0m1,390s

Uncomment the last statement if you want to see the longest sequence (with 525 items).

Update: A couple of days after I posted this, I figured out a much better caching strategy removing the difficulties explained above and giving much better performance. It is explained in this blog post.

The Collatz Conjecture in Raku

The Basic Task

For the purpose of the basic task, this is fairly straight forward. Just as for the Perl solution, we write a collatz-seq subroutine that, given an integer computes the next number in the Collatz sequence. And we call that subroutine in a loop until we reach 1:

use v6;

sub collatz-seq (UInt $in) {
    my $n = $in;
    my @result = gather {
        while $n != 1 {
            my $new-n = $n % 2 ?? 3 * $n + 1 !! $n / 2;
            take $new-n;
            $n = $new-n;
        }
    }
    return $in, |@result;
}
sub MAIN (UInt $in) {
    my @seq = collatz-seq $in;
    print "Collatz sequence for $in: ", @seq, "\n";
}

Here are a few sample runs:

$ perl6  collatz_1.p6 8
Collatz sequence for 8: 8 4 2 1

$ perl6  collatz_1.p6 23
Collatz sequence for 23: 23 70 35 106 53 160 80 40 20 10 5 16 8 4 2 1

$ perl6  collatz_1.p6 25
Collatz sequence for 25: 25 76 38 19 58 29 88 44 22 11 34 17 52 26 13 40 20 10 5 16 8 4 2 1

Note that I used the print function rather than the say function here, because say would abbreviate long sequences (for example, the sequence for 27 would end with an ellipsis: ...).

Extra Credit: Collatz Sequence for all Numbers up to 1,000,000

Please refer to the Extra Credit subsection in the Perl section above for a detailed discussion of the caching strategy. The program below is essentially a port to Raku of the Perl program above:

use v6;

my %cache = 2 => [2, 1];

sub collatz_seq (UInt $in) {
    my @result;
    my $n = $in;
    while $n != 1 {
        if %cache{$n} :exists {
            push @result, |@(%cache{$n});
            last;
        } else {
            my $new_n = $n % 2 ?? 3 * $n + 1 !! $n / 2;
            push @result, $new_n;
            %cache{$n} = [$new_n, |%cache{$new_n}] 
                if defined (%cache{$new_n}) and $new_n <= 200000;
            $n = $new_n.Int;
        }
    }
    %cache{$in} = @result if $in <= 200000;
    return @result;
}

my @long_seqs;
for 1..1000000 -> $num {
    my $seq = collatz_seq $num;
    push @long_seqs, [ $num, $seq.elems] if $seq.elems > 400;
}
@long_seqs = sort { $^b[1] <=> $^a[1]}, @long_seqs;
say  "$_[0]: $_[1]" for @long_seqs[0..19];

This program displays more or less the same output as the previous Perl program:

$ perl6 collatz.p6
837799: 525
626331: 509
939497: 507
704623: 504
910107: 476
927003: 476
511935: 470
767903: 468
796095: 468
970599: 458
546681: 452
818943: 450
820022: 450
820023: 450
410011: 449
615017: 447
886953: 445
906175: 445
922524: 445
922525: 445

This program ran in more than 9 minutes, so Raku is still significantly slower than Perl (at least for such CPU intensive computations).

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, April 12, 2020. And, please, also spread the word about the Perl Weekly Challenge if you can.

1 comment

1 Comment

laurent_r | April 16, 2020 8:46 AM | Reply

Concerning the caching difficulties in the case of the "extra-credit" for the Collatz sequence, I have found a much better caching strategy, explained in the following blog post: https://blogs.perl.org/users/laurent_r/2020/04/revisiting-the-collatz-sequence-pwc-54.html.

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About laurent_r

I am the author of the "Think Perl 6" book (O'Reilly, 2017) and I blog about the Perl 5 and Raku programming languages.

More info »

laurent_r