Perl Weekly Challenge 36: Vehicle Identification Numbers (VIN) and the Knapsack Problem

These are some answers to the Week 36 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Task # 1: Vehicle Identification Numbers (VIN)

Write a program to validate given Vehicle Identification Number (VIN). For more information, please checkout wikipedia.

From the Wikipedia article, it appears that VINs are made up of 17 digits and upper-case letters, with the exception of letters I (i), O (o) and Q (q), to avoid confusion with numerals 0, 1, and 9. There are some additional rules that only applicable to certain areas of the world but are not internationally recognized.

Vehicle Identification Numbers in Perl 5

We write a simple validate subroutine that returns a true value (1) if the passed parameter complies with the above rules for VINs and a false value (0) otherwise.

In addition, we write a test suite using the Test::More core testing framework and containing 16 test cases. The ok function is fine for checking if a Boolean value is true, but for testing that the function returns a 0 (false value), we need to use the is function of Test::More, because it unfortunately has no ko or nok function that would pass a test when the expression returns a false value.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;
use Test::More tests => 16;

sub validate {
    my $vin = shift;
    return 0 if $vin =~ /[OIQ]/;
    return 1 if $vin =~ /^[A-Z0-9]{17}$/;
    return 0;
}

ok validate("A" x 17),  "17 A's";
ok validate(1 x 17)  ,  "17 digits";
is validate("AEIOU")   , 0, "Five vowels";
is validate(1234567890), 0, "Ten digits";
is validate("1234AEIOU5678901"),   0, "Sixteen digits or letters";
is validate("12345678901234567"),  1, "17 digits";
is validate("1234567890123456Q"),  0, "16 digits and a Q";
is validate("1234567890123456O"),  0, "16 digits and a O";
is validate("1234567890123456I"),  0, "16 digits and a I";
is validate("Q1234567890123456"),  0, "A Q and 16 digits";
is validate("I1234567890123456"),  0, "An I and 16 digits";
is validate("ABCD4567890123456"),  1, "17 digits and letters";
is validate("ABef4567890123456"),  0, "Digits and some lower case letters";
is validate("ABE?4567890123456"),  0, "A non alphanumerical character";
is validate("ABCD4567 90123456"),  0, "A space";
is validate("ABCD45678901234567"), 0, "More than 17 characters";

Running the program shows that all test pass:

$ perl vin.pl
1..16
ok 1 - 17 A's
ok 2 - 17 digits
ok 3 - Five vowels
ok 4 - Ten digits
ok 5 - Sixteen digits or letters
ok 6 - 17 digits
ok 7 - 16 digits and a Q
ok 8 - 16 digits and a O
ok 9 - 16 digits and a I
ok 10 - A Q and 16 digits
ok 11 - An I and 16 digits
ok 12 - 17 digits and letters
ok 13 - Digits and some lower case letters
ok 14 - A non alphanumerical character
ok 15 - A space
ok 16 - More than 17 characters

In North America, the ninth position in a VIN is a check digit i.e. a number calculated from all other characters. Although this is not explicitly requested in the task, we’ll make a second version of our program also verifying the check digit, as a bonus. The check_digit subroutine splits the input string, translates the characters into numbers, multiplies each number by the weight assigned to its rank, sums up all the results, computes the remainder of its division by 11, and replaces the remainder by “X” if it is found to be 10.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

sub validate {
    my $vin = shift;
    return 0 if $vin =~ /[OIQ]/;
    return 0 unless $vin =~ /^[A-Z0-9]{17}$/;
    return check_digit($vin);
}

sub check_digit {
    my $vin = shift;
    my %translations = (
     A => 1, B => 2, C => 3, D => 4, E => 5, F => 6, G => 7, H => 8,
     J => 1, K => 2, L => 3, M => 4, N => 5, P => 7, R => 9, S => 2,
     T => 3, U => 4, V => 5, W => 6, X => 7, Y => 8, Z => 9, 
    );
    $translations{$_} = $_ for 0..9;

    my @weights = 
        (8, 7, 6, 5, 4, 3, 2, 10, 0, 9, 8, 7, 6, 5, 4, 3, 2);
    my $i = 0;
    my $sum = 0;
    for my $char (split //, $vin) {
        $sum += $translations{$char} * $weights[$i++];
    }
    my $mod = $sum % 11;
    $mod = 'X' if $mod == 10;
    return 1 if $mod eq substr $vin, 8, 1;
    return 0;
}

my $vin = shift // "1M8GDM9AXKP042788";
say validate($vin) ? "Correct" : "Wrong";

Running this program produces the following output:

$ perl vin.pl 1M8GDM9AXKP042788
Correct

$ perl vin.pl 1M8GDM9AXKP042789
Wrong

$ perl vin.pl
Correct

Vehicle Identification Numbers in Raku (formerly known as Perl 6)

We can just do almost the same thing in Raku.

Note that the Raku Test framework has a nok function that makes it possible to test directly a Boolean value, rather than comparing to some values.

use v6;
use Test;

sub validate ($vin) {
    return False if $vin ~~ /<[OIQ]>/;
    return True if $vin ~~ /^ <[A..Z0..9]> ** 17 $/;
    return False;
}

plan 16;

ok  validate("A" x 17),   "17 A's";
ok  validate(1 x 17),     "17 digits";
nok validate("AEIOU"),    "Five vowels";
nok validate(1234567890), "Ten digits";
nok validate("1234AEIOU5678901"),   "sixteen digits or letters";
ok  validate("12345678901234567"),  "17 digits";
nok validate("1234567890123456Q"),  "16 digits and a Q";
nok validate("1234567890123456O"),  "16 digits and a O";
nok validate("1234567890123456I"),  "16 digits and a I";
nok validate("Q1234567890123456"),  "A Q and 16 digits";
nok validate("I1234567890123456"),  "An I and 16 digits";
ok  validate("ABCD4567890123456"),  "17 digits and letters";
nok validate("ABef4567890123456"),  "Digits and some lower case letters";
nok validate("ABE?4567890123456"),  "A non alphanumerical character";
nok validate("ABCD4567 90123456"),  "A space";
nok validate("ABCD45678901234567"), "More than 17 characters";

Running the program shows that all test pass:

$ perl6 vin.p6
1..16
ok 1 - 17 A's
ok 2 - 17 digits
ok 3 - Five vowels
ok 4 - Ten digits
ok 5 - sixteen digits or letters
ok 6 - 17 digits
ok 7 - 16 digits and a Q
ok 8 - 16 digits and a O
ok 9 - 16 digits and a I
ok 10 - A Q and 16 digits
ok 11 - An I and 16 digits
ok 12 - 17 digits and letters
ok 13 - Digits and some lower case letters
ok 14 - A non alphanumerical character
ok 15 - A space
ok 16 - More than 17 characters

As in Perl 5, we’ll implement the North America check digit feature:

use v6;

sub validate (Str $vin) {
    return False if $vin ~~ /<[OIQ]>/;
    return False unless $vin ~~ /^ <[A..Z0..9]> ** 17 $/;
    return check-digit $vin;
}

sub check-digit (Str $vin) {
    my %translations = 
     A => 1, B => 2, C => 3, D => 4, E => 5, F => 6, G => 7, H => 8,
     J => 1, K => 2, L => 3, M => 4, N => 5, P => 7, R => 9, S => 2,
     T => 3, U => 4, V => 5, W => 6, X => 7, Y => 8, Z => 9;
    %translations{$_} = $_ for 0..9;
    my @weights = 8, 7, 6, 5, 4, 3, 2, 10, 0, 9, 8, 7, 6, 5, 4, 3, 2;
    my $i = 0;
    my $sum = sum map { %translations{$_} * @weights[$i++]}, $vin.comb;
    my $mod = $sum % 11;
    $mod = 'X' if $mod == 10;
    return True if $mod eq substr $vin, 8, 1;
    return False;
}

sub MAIN (Str $vin = "1M8GDM9AXKP042788") {
    say validate($vin) ?? "Correct" !! "Wrong"; 
}

Running the program displays the following output:

$ perl6 vin.p6
Correct

$ perl6 vin.p6 1M8GDM9AXKP042788
Correct

$ perl6 vin.p6 1M8GDM9AXKP042789
Wrong

Task # 2: The Knapsack Problem

Write a program to solve Knapsack Problem.

There are 5 color coded boxes with varying weights and amounts in GBP. Which boxes should be chosen to maximize the amount of money while still keeping the overall weight under or equal to 15 kg?

R: (weight = 1 kg, amount = £1)
B: (weight = 1 kg, amount = £2)
G: (weight = 2 kg, amount = £2)
Y: (weight = 12 kg, amount = £4)
P: (weight = 4 kg, amount = £10)

Bonus task, what if you were allowed to pick only 2 boxes or 3 boxes or 4 boxes? Find out which combination of boxes is the most optimal?

The knapsack problem or rucksack problem is a well-known problem in combinatorial optimization: given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible. It derives its name from the problem faced by someone who is constrained by a fixed-size knapsack and must fill it with the most valuable items. In this specific case, this is what is sometimes called the 0-1 knapsack problem, where you can chose only one of each of the listed items.

I will directly take the “bonus” version of the problem, as it seems simpler to take this constraint right from the beginning.

The knapsack problem is known to be a at least an NP-Complete problem (and the optimization problem is NP-Hard). This means that there is no known polynomial algorithm which can tell, given a solution, whether it is optimal. There are, however, some algorithms that can solve the problem in pseudo-polynomial time, using dynamic programming. However, with a set of only five boxes, we can run a so-called brute-force algorithm, that is try all possible solutions to find the best. A better algorithm would probably be needed to manage 30 or more boxes, but we’re given only 5 boxes, and trying to find a better algorithm for only five boxes would be, in my humble view, a case of over-engineering.

The Knapsack Problem in Perl 5

To start with, we populate a %boxes hash of hashes with the box colors as keys, and their respective weights and values.

The most immediate solution to test all boxes combinations would be to use five nested loops, but that’s tedious and ugly, and we would need to neutralize some of the loops for satisfying the bonus task with only 2, 3, or 4 boxes.

I prefer to implement a recursive solution where the parameters to the recursive try_one subroutine govern the number of loops that will be performed. These parameters are as follows: * Current cumulative weight of the selected boxes; * Current total value of the selected boxes; * Maximum number of boxes to be selected (for the bonus) * A string listing the boxes used so far in the current solution; * A list of the boxes still available;

For the first call of try_one, we have the following parameters: 0 for the weight, 0 for the value, the maximum number of boxes to be used is passed as a parameter to the script or, failing a parameter, defaulted to 5, an empty string for the list of boxes, and the list of box colors.

The recursion base case (where recursion should stop) is reached when the current weight exceed 15 or when the number of available boxes left reaches 0.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

my %boxes = (
    "R" => { "w" => 1,  val => 1  },
    "B" => { "w" => 1,  val => 2  },
    "G" => { "w" => 2,  val => 2  },
    "Y" => { "w" => 12, val => 4  },
    "P" => { "w" => 4,  val => 10 },
);
my $start_nb_boxes = shift // 5;
my $max_val = 0;
my $max_boxes;

sub try_one {
    my ($cur_weight, $cur_val, $num_boxes, $boxes_used, @boxes_left) = @_;
    if ($cur_val > $max_val) {
        $max_val = $cur_val;
        $max_boxes = $boxes_used;
    }
    for my $box (@boxes_left) {
        my $new_cur_weight = $cur_weight + $boxes{$box}{w};
        next if $new_cur_weight > 15 or $num_boxes <= 0;
        my @new_boxes_left = grep $_ ne $box, @boxes_left;
        my $new_box_used = $boxes_used ? $boxes_used . "-$box" : $box;
        try_one ($new_cur_weight, $cur_val + $boxes{$box}{val}, $num_boxes -1, $new_box_used, @new_boxes_left);
    }
}
try_one (0, 0, $start_nb_boxes, "", keys %boxes);        
say "Max: $max_val, Boxes:  $max_boxes";

Note that we’re using two global variables to store the maximum value and the corresponding list of boxes. This is often frowned upon as contrary to the best practices, and often rightly so, but I consider that these variables are really global to the program (they keep track of the best solution so far) and not part of any specific recursive call of the subroutine. We could easily pass these variables around back and forth as parameters to and return values of the recursive calls, but that would make the program more complicated (and probably slightly slower) with no obvious benefit. Best practices are good to follow when they make sense, but, in my view, they shouldn’t become a bureaucratic straight jacket, and I don’t think we should try to bend over backward to follow them when they make things significantly more complicated than needed. But that’s only my humble opinion, you may disagree with that.

Running this script with no parameter yields a solution with four boxes (predictably all boxes except “Y” that has a very high weight):

$ perl boxes.pl
Max: 15, Boxes:  R-G-B-P

And, for the bonus we run the same program with parameters between 1 and 4:

$ perl boxes.pl 4
Max: 15, Boxes:  B-P-G-R

$ perl boxes.pl 3
Max: 14, Boxes:  B-P-G

$ perl boxes.pl 2
Max: 12, Boxes:  B-P

$ perl boxes.pl 1
Max: 10, Boxes:  P

One of the weaknesses of the recursive solution above is that we are testing permutations (i.e. subsets of the data where the order in which the boxes are selected matters) of elements, rather than combinations (where the order doesn’t matter), and we of course don’t care about the order in which we pick the boxes. So our program is doing too much work, because it’s testing far too many cases. It doesn’t really matter for a small set of 5 boxes, as we obtain the result in significantly less than a tenth of a second:

$ time perl boxes.pl
Max: 15, Boxes:  P-B-G-R

real    0m0,078s
user    0m0,000s
sys     0m0,015s

But it would still be nicer to test only combinations, as this would scale better for larger data sets. To get combinations, we can just retain only permutations that are in a given order, for example in alphabetic order, and filter out the others. We add one parameter to our recursive subroutine, $last_box_used, to that we can compare each box in the for loop with it and only keep those where box comes after in the alphabetic order. And we make our first call of the try_onesubroutine with an additional dummy parameter, “A”, which comes before any of the boxes.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

my %boxes = (
    "R" => { "w" => 1,  val => 1  },
    "B" => { "w" => 1,  val => 2  },
    "G" => { "w" => 2,  val => 2  },
    "Y" => { "w" => 12, val => 4  },
    "P" => { "w" => 4,  val => 10 },
);
my $start_nb_boxes = shift // 5;
my $max_val = 0;
my $max_boxes;

sub try_one {
    my ($cur_weight, $cur_val, $num_boxes, $boxes_used, $last_box_used, @boxes_left) = @_;
    if ($cur_val > $max_val) {
        $max_val = $cur_val;
        $max_boxes = $boxes_used;
    }
    for my $box (@boxes_left) {
        next if $box lt $last_box_used;
        my $new_cur_weight = $cur_weight + $boxes{$box}{w};
        next if $new_cur_weight > 15 or $num_boxes <= 0;
        my @new_boxes_left = grep $_ ne $box, @boxes_left;
        my $new_box_used = $boxes_used ? $boxes_used . "-$box" : $box;
        try_one ($new_cur_weight, $cur_val + $boxes{$box}{val}, $num_boxes -1, $new_box_used, $box, @new_boxes_left);
    }
}
try_one (0, 0, $start_nb_boxes, "", "A", sort keys %boxes);        
say "Max: $max_val, Boxes:  $max_boxes";

This runs slightly faster that our previous version of the script:

$ time perl boxes.pl 5
Max: 15, Boxes:  B-G-P-R

real    0m0,062s
user    0m0,000s
sys     0m0,030s

Such an optimization is of course useless with such a small input dataset (and such short run times), but it might help quite significantly if we had larger input.

The Knapsack Problem in Raku

As a starting point, we can adapt the improved version of our Perl 5 recursive subroutine to Raku:

use v6;

constant %boxes = (
    "R" => { "w" => 1,  val => 1  },
    "B" => { "w" => 1,  val => 2  },
    "G" => { "w" => 2,  val => 2  },
    "Y" => { "w" => 12, val => 4  },
    "P" => { "w" => 4,  val => 10 },
);

sub MAIN (UInt $start-nb-boxes = 5) {
    my @boxes = keys %boxes;
    my $*max-val = 0;
    my $*max-boxes = "";
    try-one(0, 0, $start-nb-boxes, "", "A", @boxes);        
    say "Max: $*max-val, Boxes:  $*max-boxes";
    say now - INIT now;
}

sub try-one ($cur-weight, $cur-val, $num-boxes, $boxes-used, $last-box-used, @boxes-left) {
    if $cur-val > $*max-val {
        $*max-val = $cur-val;
        $*max-boxes = $boxes-used;
    }
    for @boxes-left -> $box {
        next if $box lt $last-box-used;
        my $new-cur-weight = $cur-weight + %boxes{$box}{'w'};
        next if $new-cur-weight > 15 or $num-boxes <= 0;
        my @new-boxes-left = grep { $_ ne $box}, @boxes-left;
        my $new-box-used = $boxes-used ?? $boxes-used ~ "-$box" !! $box;
        try-one $new-cur-weight, $cur-val + %boxes{$box}{'val'}, $num-boxes -1, $new-box-used, $box, @new-boxes-left;
    }
}

This are some examples of output:

$ perl6 boxes.p6
Max: 15, Boxes:  B-G-P-R
0.0099724

$ perl6 boxes.p6 4
Max: 15, Boxes:  B-G-P-R
0.0209454

$ perl6 boxes.p6 3
Max: 14, Boxes:  B-G-P
0.01895075

$ perl6 boxes.p6 2
Max: 12, Boxes:  B-P
0.0109711

Note that the overall run time (as measured by the Unix time command is about 0.4 second, so much more than the Perl 5 equivalent), but the execution time of the script itself is between 10 and 20 milliseconds, so that most of the overall run time is presumably taken by compilation and start up time.

But Raku offers the built-in combinations routine that can make our program shorter and simpler. It will return a list (really a Seq) of all possible combinations of the input list or array. You can even specify the number of items, or, even better, a range for the numbers of items in each combinations; this will enable us to answer the bonus question by specifying the maximal number of boxes, and also to remove from the output the empty list (which may otherwise generate errors or warnings). The find-best subroutine does most of the work: the first statement populates a @valid-candidates array with combinations not exceeding the maximal weight, along with their total respective values, and the next statement returns the maximal value combination.

use v6;

constant %boxes = (
    "R" => { "w" => 1,  val => 1  },
    "B" => { "w" => 1,  val => 2  },
    "G" => { "w" => 2,  val => 2  },
    "Y" => { "w" => 12, val => 4  },
    "P" => { "w" => 4,  val => 10 },
);
sub MAIN (UInt $max-nb = 5) {
    my ($best, $max) = find-best %boxes.keys.combinations: 1..$max-nb;
    say "Max: $max; ", $best;
}
sub find-best (@candidates) {
    my @valid-candidates = gather for @candidates -> $cand {
        take [ $cand, $cand.map({ %boxes{$_}{'val'}}).sum ] 
            if $cand.map({ %boxes{$_}{'w'}}).sum <= 15;
    }
    return  @valid-candidates.max({$_[1]});
}

The output is the same as before:

$ perl6 boxes2.p6
Max: 15; (R G B P)

$ perl6 boxes2.p6 5
Max: 15; (R P B G)

$ perl6 boxes2.p6 4
Max: 15; (B G P R)

$ perl6 boxes2.p6 3
Max: 14; (B G P)

$ perl6 boxes2.p6 2
Max: 12; (P G)

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, December 8. And, please, also spread the word about the Perl Weekly Challenge if you can.

Leave a comment

About laurent_r

user-pic I am the author of the "Think Perl 6" book (O'Reilly, 2017) and I blog about the Perl 5 and Raku programming languages.