Perl Weekly Challenge 75: Coin Sums and Largest Rectangle Histogram

These are some answers to the Week 75 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in a few days from now (August 30, 2020). This blog post offers some solutions to this challenge, please don’t read on if you intend to complete the challenge on your own.

Task 1: Coins Sums

You are given a set of coins @C, assuming you have infinite amount of each coin in the set.

Write a script to find how many ways you make sum $S using the coins from the set @C.

Example:

Input:
    @C = (1, 2, 4)
    $S = 6

Output: 6
There are 6 possible ways to make sum 6.
a) (1, 1, 1, 1, 1, 1)
b) (1, 1, 1, 1, 2)
c) (1, 1, 2, 2)
d) (1, 1, 4)
e) (2, 2, 2)
f) (2, 4)

Coin Sums in Raku

The first idea that might come to mind may be to use the combinations and/or permutations, or some combination thereof, to generate all the possible coin combinations and then to filter out those not matching the target value (and also remove duplicates). I’ve done that in some previous Perl Weekly Challenges. However, my experience tells me that, even for moderately large input data, this process would generate a lot, I really mean A LOT, of useless combinations leading to poor performance.

One alternative is to construct the various coin combinations by iterating over the possible values. If we knew in advance that we were going to have, say, three coin values, it would be very easy to implement three nested for loops to test all possible combinations. But when we don’t know in advance how many coin values we’re going to get, then it is usually much simpler to use a recursive subroutine. In the program below, the recursive find-sum subroutine loops over the input values and calls itself again for each of these values. The recursion stops when the sum obtained so far is equal to the target sum (in which case it stores the solution) or greater than it. The solutions obtained are sorted in ascending order, stringified and stored in a SetHash to remove any duplicate.

use v6;

my $target_sum = shift @*ARGS;
my @coins = sort @*ARGS;
my SetHash $result;
find-sum(0, []);
.say for $result.keys.sort;

sub find-sum (Int $start, @allocated-so-far) {
    return if $start > $target_sum;
    if $start == $target_sum {
        $result{join " ", sort @allocated-so-far}++;
        return;
    }
    for @coins -> $coin {
        my @new-allocated =  | @allocated-so-far, $coin;
        my $new-sum = $start + $coin;
        find-sum($new-sum, @new-allocated);
    }
}

This works fine for small input:

$ ./raku coin-sum.raku 6 1 2 4
1 1 1 1 1 1
1 1 1 1 2
1 1 2 2
1 1 4
2 2 2
2 4

However, if we set the target value to, say, 20 and use more coin values, the program starts to take quite a lot of time to run. For example, more than 27 seconds in the following example:

$ time raku coin-sum.raku 20 1 2 4 5
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2
(68 lines omitted for brevity)
4 4 4 4 4
5 5 5 5

real    0m27,397s
user    0m0,015s
sys     0m0,062s

The reason for this poor performance is that the program inspects a lot of values that turn out to be duplicates. For example, in the case of the first example above with a target value of 6, the program finds:

1 1 1 1 2

but it also tests:

1 1 1 2 1
1 1 2 1 2
1 2 1 1 1
2 1 1 1 1

All these values are really duplicates and are subsequently eliminated, but that’s a lot of useless work.

And it gets much worse for larger target values and larger coin value sets.

We can improve considerably the program’s performance if, rather subsequently eliminating duplicates, we avoid generating them in the first place. For this, we need to make sure that the coin values are in ascending order and use only coin values that are larger than or equal to the last coin value used so far. In other words, rather than generating all possible coin permutations, we will generate only the permutations in strict ascending order. And, as we’ll see, that’s a huge difference. We no longer need a SetHash to remove duplicates, an array will be sufficient to store the results. We no longer need to sort the coin values in each combination, we only need to sort the initial array of coin values. But since we no longer remove duplicates from the results, we also need to make sure there is no duplicate in the initial array of coin values.

use v6;

my $target_sum = shift @*ARGS;
my @coins = @*ARGS.sort.squish;
my @result;
find-sum(0, []);
.say for @result;

sub find-sum (Int $start, @allocated-so-far) {
    return if $start > $target_sum;
    if $start == $target_sum {
        push @result, join " ", @allocated-so-far;
        return;
    }
    my $last-coin = 0;
    $last-coin = @allocated-so-far[*-1] if defined @allocated-so-far[*-1];
    for @coins.grep({$_ >= $last-coin}) -> $coin {
        find-sum($start + $coin, (| @allocated-so-far, $coin));
    }
}

Remember that we had a runtime of about 27.4 second for a target value of 20 and four input coin values? We’re now down to about half a second with exactly the same input parameters:

$ time raku coin-sum2.raku 20 1 2 4 5
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2
(67 lines omitted for brevity)
2 4 4 5 5
4 4 4 4 4
5 5 5 5

real    0m0,537s
user    0m0,030s
sys     0m0,030s

Now, of course, we are talking of a process where we have an exponential (or rather factorial) combinatorial explosion. There are some inherent limits to the input size even when you improve considerably the algorithm. For example, with a target value of 50 and a few more coin values, we get this:

$ time raku coin-sum2.raku 50 1 2 4 5 6 7 8 | wc
   8466  159909  328284

real    0m20,867s
user    0m0,015s
sys     0m0,108s

There may be some possible pure-Raku micro-optimizations, but, against a combinational explosion, they will necessarily be wiped out by just a slightly larger input. Parallel processing might bring a performance improvement, but only marginally with an exponential process. Similarly, using C code will definitely help, but only to a quite limited extent.

Coin Sums in Perl

I have explained in the Raku section above some very significant performance improvements between my initial and my final Raku implementation. I’ll port to Perl the performance-improved version (please refer to the Raku section above for explanations about performance improvements):

use strict;
use warnings;
use feature "say";

my $target_sum = shift;
my $prev = 0;
my @coins = grep { $_ > $prev and $prev = $_}  sort { $a <=> $b } @ARGV;
die "Need at least two integer values" unless @coins;
my @result;
find_sum(0, ());
say for @result;

sub find_sum {
    my ($start, @allocated_so_far) = @_;
    return if $start > $target_sum;
    if ($start == $target_sum) {
        push @result, join " ", @allocated_so_far;
    } else {
        my $last_coin = $allocated_so_far[-1] // 0;
        for my $coin (grep $_ >= $last_coin, @coins) {      
            find_sum($start + $coin, @allocated_so_far, $coin);
        }
    }
}

The resulting output is as expected:

$ perl coin-sum.pl 6 1 2 3
1 1 1 1 1 1
1 1 1 1 2
1 1 1 3
1 1 2 2
1 2 3
2 2 2
3 3

With a larger input (target value of 20), the Perl program is 6 to 7 times faster than the Raku program. I’m sorry to have to say that, but, quite obviously, for such a CPU-intensive problem, there is still quite a bit of room for improvement in terms of Raku performance (compared to Perl).

Task 2: Largest Rectangle Histogram

You are given an array of positive numbers @A.

Write a script to find the larget rectangle histogram created by the given array.

BONUS: Try to print the histogram as shown in the example, if possible.

Example 1:

Input: @A = (2, 1, 4, 5, 3, 7)

     7           #
     6           #
     5       #   #
     4     # #   #
     3     # # # #
     2 #   # # # #
     1 # # # # # #
     _ _ _ _ _ _ _
       2 1 4 5 3 7

Looking at the above histogram, the largest rectangle (4 x 3) is formed by columns (4, 5, 3 and 7).

Output: 12

Let me start with a comment. Depending on the input data, there can be two or more rectangles having the same maximum area. Since the requested output is just the size of the rectangle, we obviously don’t care when there are several rectangles. That being said, I’ll nonetheless add to the output information about the range of values that produced the largest rectangle (or one of them when there are more than one). This make it easier to check that the results are correct. It would be quite easy to change that and list all the rectangles when there are more than one, but I preferred to spend my time making a program in a language (in addition to Raku and Perl) that has never been used so far in the Perl Weekly Challenge (see below) and that probably nobody in the team knows (besides me).

Finally, for the bonus, I slightly changed the output to also include the subscripts of the input array, as this also makes it a bit easier to verify the result.

Largest Rectangle Histogram in Raku

The method used is really brute force: trying all possible rectangles of the histogram. For example, we start with the first column (with value 2 in the above histogram). From this column, we can derive two rectangles!

2 #
1 #

and:

1 # # # # # #

The second one is the winner so far. Then we go to the next column (with value 1), and will obviously not get anything better. We move next to the third column (with value 4) and can find two better candidates:

4     # # 
3     # # 
2     # # 
1     # #

and:

3     # # # #
2     # # # #
1     # # # #

As it turns out, the second solution above is the best for the input data, but we don’t know yet, so we need to continue the process until we have exhausted all the possibilities.

use v6;

my @a = @*ARGS.elems > 1 ?? @*ARGS !! (2, 1, 4, 5, 3, 7);
draw-histo(@a);
my ($area, @rectangle) = largest-rect(@a);
say "Largest rectangle is @rectangle[] with an area of $area.";


sub draw-histo (@in) {
    my $max-val = @in.max;
    say "  ", join "  ", 0..@in.end;
    say "  -" x @in.elems;
    for (1..$max-val).reverse -> $ordinate {
        print $ordinate;
        for 0..@in.end -> $i {
            print @in[$i] >= $ordinate ?? " # " !! "   ";
        }
        say "";
    }
    say "  =" x @in.elems;
    say "  ", join "  ", @in;
}

sub largest-rect (@in) {
    my $largest_area = 0;
    my @best-vals = 0, 0;
    for 0..^@in.end -> $i {
        for $i^..@in.end -> $j {
            my $area = ($j - $i + 1) * @in[$i..$j].min;
            # say "testing $i $j $area";
            if $area > $largest_area {
                $largest_area = $area;
                @best-vals = $i, $j;
            }
        }
    }
    return $largest_area, @best-vals;
}

This is the displayed output for two different input data sets:

$ raku largest_rect.raku
  0  1  2  3  4  5
  -  -  -  -  -  -
7                #
6                #
5          #     #
4       #  #     #
3       #  #  #  #
2 #     #  #  #  #
1 #  #  #  #  #  #
  =  =  =  =  =  =
  2  1  4  5  3  7
Largest rectangle is 2 5 with an area of 12.


$ raku largest_rect.raku 5 6 3 4 2 7 9 8
  0  1  2  3  4  5  6  7
  -  -  -  -  -  -  -  -
9                   #
8                   #  #
7                #  #  #
6    #           #  #  #
5 #  #           #  #  #
4 #  #     #     #  #  #
3 #  #  #  #     #  #  #
2 #  #  #  #  #  #  #  #
1 #  #  #  #  #  #  #  #
  =  =  =  =  =  =  =  =
  5  6  3  4  2  7  9  8
Largest rectangle is 5 7 with an area of 21.

Largest Rectangle Histogram in Perl

This is a port to Perl of my Raku program above. The only significant difference is that I had to write my own min subroutine (I know that there exist modules to do that, but I do not want to use off-the-shelf third-party software packages for programming challenges).

use strict;
use warnings;
use feature "say";

my @a = @ARGV > 1 ? @ARGV : ( 2, 1, 4, 5, 3, 7);
draw_histo(@a);
my ($area, @rectangle) = largest_rect(@a);
say "Largest rectangle is @rectangle with an area of $area.";

sub draw_histo {
    my @in = @_;
    my $max_val = $in[0];
    for my $i (1..$#in) {
        $max_val = $in[$i] if $in[$i] > $max_val;
    }
    say "\n  ", join "  ", 0..$#in;
    say "  -" x scalar @in;
    for my $ordinate (reverse 1..$max_val) {
        print $ordinate;
        for my $i (0..$#in) {
            print $in[$i] >= $ordinate ? " # " : "   ";
        }
        say "";
    }
    say "  =" x scalar @in;
    say "  ", join "  ", @in;
    say "";
}

sub min {
    my @vals = @_;
    my $min = shift @vals;
    for my $val (@vals) {
        $min = $val if $val < $min;
    }
    return $min;
}

sub largest_rect {
    my @in = @_;
    my $largest_area = 0;
    my @best_vals = (0, 0);
    for my $i (0..$#in -1) {
        for my $j ($i + 1..$#in) {
            my $area = ($j - $i + 1) * min @in[$i..$j];
            # say "testing $i $j $area";
            if ($area > $largest_area) {
                $largest_area = $area;
                @best_vals = ($i, $j);
            }
        }
    }
    return $largest_area, @best_vals;
}

This is the displayed output for three different input data sets:

$ perl largest_rect.pl 6 5 4 2 3 1 2

  0  1  2  3  4  5  6
  -  -  -  -  -  -  -
6 #
5 #  #
4 #  #  #
3 #  #  #     #
2 #  #  #  #  #     #
1 #  #  #  #  #  #  #
  =  =  =  =  =  =  =
  6  5  4  2  3  1  2

Largest rectangle is 0 2 with an area of 12.

$ perl largest_rect.pl 6 5 4 2 3 2 2

  0  1  2  3  4  5  6
  -  -  -  -  -  -  -
6 #
5 #  #
4 #  #  #
3 #  #  #     #
2 #  #  #  #  #  #  #
1 #  #  #  #  #  #  #
  =  =  =  =  =  =  =
  6  5  4  2  3  2  2

Largest rectangle is 0 6 with an area of 14.

$ perl largest_rect.pl

  0  1  2  3  4  5
  -  -  -  -  -  -
7                #
6                #
5          #     #
4       #  #     #
3       #  #  #  #
2 #     #  #  #  #
1 #  #  #  #  #  #
  =  =  =  =  =  =
  2  1  4  5  3  7

Largest rectangle is 2 5 with an area of 12.

Largest Rectangle Histogram in Gembase

Never heard about the Gembase programming language? I’m not surprised.

Gembase is a proprietary language originally developed in the 1980s and 1990s for accessing relational databases (Digital RDB, RMS on VMS, MS SQL Server, Sybase, and Oracle under Unix or Linux), developed initially by Ross Systems and then by CDC Software. It is quite similar in many respects to PL-SQL under Oracle. It is highly efficient for large databases, and it is very powerful and expressive in terms of database access and updates, report producing, ASCII menus, and so on. But, just as PL-SQL, it is quite poor as a general purpose programming languages.

Among its limitations and other low-expressive features, I can cite:

  • The while loop is the only looping construct, no for loops, no next statement, no last statement, etc.; this leads to quite a bit of boiler-plate code to manage the loop variables;
  • no hash or associative table (well, there are virtual tables, which are pretty rich and expressive, but not with the performance of hashes);
  • No regexes;
  • Arrays are global variables and cannot be passed as a parameter to a function (although individual array elements can); also, there is no way to populate an array directly with a list of values (except with the csv_split built-in function used in the script below; in that case, the array has a 1-based index);
  • The overall syntax looks a bit obsolete (Pascal-like).

Clearly, I would not seriously use Gembase for solving such a problem (just as I would not use PL-SQL), as this leads to a lot of boring code. Raku and Perl are far far better. I undertook this task for the sole purpose of the challenge.

I guess that most people will be able to understand the overall syntax, you just need to know a few unusual things:

  • Comments are introduced by the ! exclamation mark symbol;

  • Variable names start with a sigil, namely the # symbol;

  • The language is generally not case-sensitive, it is quite common to use upper case for PROCEDURE_FORM and END_FORM to make the program structure more visible; some people also use uppercase for keywords (while, ìf, etc.), but that tends to make the code less legible, because we end up with too many uppercase words;

  • Subroutine parameters are essentially passed by reference (meaning that any parameter modification within a subroutine will be propagated on the caller’s side);

  • & is the string concatenation operator and also the code line continuation operator, so that you can use && if you want to assign a string over two code lines;

  • error /text_only is a statement used here to print out a string to the screen with a cleaner result than the ordinary print function.

This is the Gembase code:

PROCEDURE_FORM MAIN (#p1)
    #array(0) = 0
    if (#p1 = "") #p1 = "2,1,4,5,3,7"
    ! Splits the input CSV into an #array of #count values (1-based index).
    ! Arrays are global variables and cannot be passed as function parameters
    #count = csv_split(#array, #p1)
    #largest = 0
    #best_i = 0
    #best_j = 0
    perform LARGEST_RECT(#count, #largest, #best_i, #best_j)
    #msg = "Largest rectangle is between indices " & #best_i - 1 & " and " & #best_j - 1 &&
            ". Area is " & #largest & "."
    error /text_only  (#msg)
    perform DRAW_HISTO (#count)  
END_FORM

PROCEDURE_FORM LARGEST_RECT (#nb_items, #largest_area, #best_i, #best_j)
    #i = 1
    while (#i < #nb_items)
        #j = #i + 1
        while (#j <= #nb_items)
            #k = #i
            #min = #array(#k)
            ! finding the minimal height within the index range
            while (#k <= #j)
                if (#array(#k) < #min) #min = #array(#k)
                #k = #k + 1
            end_while
            #area = (#j - #i + 1) * #min
            if (#area > #largest_area)
                #largest_area = #area
                #best_i = #i
                #best_j = #j
            end_if
            #j = #j + 1
        end_while
        #i = #i + 1
end_while
END_FORM

PROCEDURE_FORM DRAW_HISTO (#nb_items)
    #i = 1
    #max_val = #array(#i)
    while (#i <= #nb_items)
        if (#array(#i) > #max_val) #max_val = #array(#i)
        #i = #i + 1
    end_while
    #i = 1
    #line = ""
    while (#i <= #nb_items)
        #line = #line & "  " & #array(#i)
        #i = #i + 1
    end_while
    error /text_only  #line
    #i = 1
    #line = ""
    while (#i <= #nb_items)
        #line = #line & "  -"
        #i = #i + 1
    end_while
    error /text_only  #line
    #ordinate = #max_val
    while (#ordinate >= 1)
        #line = #ordinate
        #i = 1
        while (#i <= #nb_items)
            if (#array(#i) >= #ordinate)
                #line = #line & " # "
            else
                #line = #line & "   "
            end_if
            #i = #i + 1
        end_while
        error /text_only  #line
        #ordinate = #ordinate - 1
    end_while
    #line = ""
    #i = 1
    while (#i <= #nb_items) 
        #line = #line & "  ="
        #i = #i + 1
    end_while
    error /text_only  #line
    #line = ""
    #i = 1
    while (#i <= #nb_items)
        #line = #line & "  " & #array(#i)
        #i = #i + 1
    end_while
    error /text_only  #line
END_FORM

This is the output of this program:

Largest rectangle is between indices 2 and 5. Area is 12.

  2  1  4  5  3  7
  -  -  -  -  -  -
7                #
6                #
5          #     #
4       #  #     #
3       #  #  #  #
2 #     #  #  #  #
1 #  #  #  #  #  #
  =  =  =  =  =  =
  2  1  4  5  3  7

Wrapping up

The next week Perl Weekly Challenge will start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, September 6, 2020. And, please, also spread the word about the Perl Weekly Challenge if you can.

Leave a comment

About laurent_r

user-pic I am the author of the "Think Perl 6" book (O'Reilly, 2017) and I blog about the Perl 5 and Raku programming languages.