Perl Weekly Challenge 036: VIN Validation and the Knapsack Problem

VIN Validation

Write a program to validate given Vehicle Identification Number (VIN).

I followed the description at Wikipedia. Sometimes, it wasn’t exactly clear whether the described rule should be valid everywhere or just in a part of the world; the rules also developed with time, so older vehicles can bear VINs that would be considered invalid for a modern car.

Most of the validation is implemented in a single subroutine validate_vin. It takes two parameters, $vin and $sold: the second one says where the car was sold. "North America" and "China" are two values that trigger a different behaviour of the validator.

sub validate_vin {
    my ($vin, $sold) = @_;
    $sold //= "";

    die 'Invalid length or character'
        unless $vin =~ /^[0-9A-HJ-NPR-Z]{17}$/;

    die 'Invalid country code'
        if $vin =~ /^(?:A[P-Z0-9]|[BCD][S-Z0-9]
                    |[EF][L-Z0-9]|[GH]
                    |[NP][S-Z0-9]|S[05-9]
                    |T[02-9]|U[01-489A-G]|Z[06-9S-W]
                    |[39]0|8[03-9]
                    )/x;

    die 'Invalid North American VIS'
        if $sold eq 'North America' && $vin !~ /^.{12}[0-9]{5}/;

    die 'Invalid year' if $vin =~ /^.{9}[UZ0]/;

    die 'Invalid check'
        if ($sold eq 'North America' || $sold eq 'China')
        && ! check_digit($vin);
}

The first exception is thrown when the VIN contains invalid characters (only numbers and capital letters except for I, O, and Q are allowed) or when the VIN isn’t 17 character long.

The second exception optimises the table of countries into a single regex.

The third exception checks that the VIS part of the VIN is numeric. This rule only applies to the North America.

The fourth exception checks that the year code is valid (U, Z, and 0 aren’t allowed).

The last exception applies to North American and Chinese cars and implements the check-digit calculation. I extracted its implementation to a separate subroutine:

my %TRANSLITERATION = qw( A 1 B 2 C 3 D 4 E 5 F 6 G 7 H 8
                          J 1 K 2 L 3 M 4 N 5 P 7 R 9 S 2
                          T 3 U 4 V 5 W 6 X 7 Y 8 Z 9 );

my @WEIGHTS = qw( 8 7 6 5 4 3 2 10 0 9 8 7 6 5 4 3 2 );

sub check_digit {
    my ($vin) = @_;
    my @w = @WEIGHTS;
    my $remainder = sum(map shift(@w) * ($TRANSLITERATION{$_} || $_),
                        split //, $vin
                    ) % 11;
    $remainder = 'X' if 10 == $remainder;
    return $remainder eq substr $vin, 8, 1
}

As usually, here are some tests to verify the implementation. As the subroutine dies on an invalid VIN, we used Test::Exception to test it.

use Test::More tests => 19;
use Test::Exception;

throws_ok { validate_vin('1' x 18) } qr/Invalid length/, 'too long';
throws_ok { validate_vin('1' x 16) } qr/Invalid length/, 'too short';

throws_ok { validate_vin('O' . '1' x 16) }
          qr/Invalid.*character/, 'capital O';
throws_ok { validate_vin('I' . '1' x 16) }
          qr/Invalid.*character/, 'capital I';
throws_ok { validate_vin('Q' . '1' x 16) }
          qr/Invalid.*character/, 'capital Q';

lives_ok { validate_vin('1G145678XABCX2345') } 'valid outside NA';
throws_ok { validate_vin('1G145678XABCX2345', 'North America') }
          qr/North American VIS/, 'NA vis';

throws_ok { validate_vin('111111111U1111111') }
          qr/Invalid year/, 'invalid year';

lives_ok { validate_vin('1' x 17) } 'straight-ones';
lives_ok { validate_vin('1M8GDM9AXKP042788') } 'hypothetical';
lives_ok { validate_vin('1M8GDM9AXKP042788', 'North America') }
    'hypothetical NA';
lives_ok { validate_vin('5GZCZ43D13S812715') } 'valid example';
lives_ok { validate_vin('5GZCZ43D13S812715', 'North America') }
    'valid NA example';

lives_ok { validate_vin('SGZCZ43D13S812715') } 'outside NA and China';
throws_ok { validate_vin('SGZCZ43D13S812715', 'China') }
          qr/check/, 'invalid check';

lives_ok { validate_vin('WP0ZZZ99ZTS392124') } 'outside NA and China';
throws_ok { validate_vin('WP0ZZZ99ZTS392124', 'China') }
          qr/check/, 'invalid check';

lives_ok { validate_vin('KLATF08Y1VB363636') } 'outside NA and China';
throws_ok { validate_vin('KLATF08Y1VB363636', 'China') }
          qr/check/, 'invalid check';

The Knapsack Problem

There are 5 colour coded boxes with varying weights and amounts in GBP. Which boxes should be chosen to maximise the amount of money while still keeping the overall weight under or equal to 15 kgs?
            R: (weight = 1 kg, amount = £1)
            B: (weight = 1 kg, amount = £2)
            G: (weight = 2 kg, amount = £2)
            Y: (weight = 12 kg, amount = £4)
            P: (weight = 4 kg, amount = £10)
Bonus task, what if you were allowed to pick only 2 boxes or 3 boxes or 4 boxes? Find out which combination of boxes is the most optimal?

The Knapsack problem is famous because it’s easy to understand but hard to solve in the general case. But five boxes are a set small enough for a brute force solution, which is what we’ll show.

To iterate over all the possible solutions (we assume no box can be repeated), let’s use an indicator function: let’s assign 1 to each box that was picked, and 0 to the ones not being chosen. To iterate over all possibilities, we can just start from the binary 00000 (no box selected) and increment the number up to 11111 (all boxes selected).

#!/usr/bin/perl
use warnings;
use strict;

use List::Util qw{ sum };

my %weight = (
    R => 1,
    B => 1,
    G => 2,
    Y => 12,
    P => 4);

my %price = (
    R => 1,
    B => 2,
    G => 2,
    Y => 4,
    P => 10);

my $MAX_WEIGHT = 15;

my @boxes = keys %weight;

sub binary_plusplus {
    my ($n) = @_;
    my $p = 0;
    $n->[$p++] = 0 while $n->[$p] != 0;
    $n->[$p] = 1;
}

sub pick {
    my ($max_count) = @_;
    my @mask = ('0') x keys %weight;
    my %best = (mask => [@mask], price => 0);
    while (grep ! $_, @mask) {
        next if $max_count < grep $_, @mask;

        my $w = sum(map $mask[$_] * $weight{ $boxes[$_] },
                    0 .. $#mask);
        my $p = sum(map $mask[$_] *  $price{ $boxes[$_] },
                    0 .. $#mask);

        next if $w > $MAX_WEIGHT;

        %best = (mask => [@mask], price => $p)
            if $p > $best{price};

    } continue {
        binary_plusplus(\@mask);
    }

    return [map $best{mask}[$_]
            ? (keys %weight)[$_]
            : (), 0 .. $#mask]
}

And again, let’s verify the implementation with tests. We use bag from Test::Deep because we don’t care about the order of the boxes in the solution. Also note that for 2 boxes, there are two different best solutions.

use Test::More tests => 5;
use Test::Deep;

cmp_deeply pick(5), bag(qw( R B G P ));

cmp_deeply pick(4), bag(qw( R B G P ));

cmp_deeply pick(3), bag(qw( B P G ));

cmp_deeply pick(2), any(bag(qw( B P )), bag(qw( G P )));

cmp_deeply pick(1), [qw[ P ]];

Leave a comment

About E. Choroba

user-pic I blog about Perl.