Perl Weekly Challenge 050: Merge Intervals and Noble Integer

Merge Intervals

Write a script to merge the given intervals where ever possible.
[2,7], [3,9], [10,12], [15,19], [18,22]

The script should merge [2, 7] and [3, 9] together to return [2, 9].

Similarly it should also merge [15, 19] and [18, 22] together to return [15, 22].

The final result should be something like below:

[2, 9], [10, 12], [15, 22]

This sounds so similar to PWC 039 I first thought I could solve it in the same way. Unfortunately, Set::IntSpan gives a different result:

#!/usr/bin/perl
use warnings;
use strict;

use Set::IntSpan;

my @intervals = ([2, 7], [3, 9], [10, 12], [15, 19], [18, 22]);

my $set = 'Set::IntSpan'->new([@intervals]);

print $set; # 2-12,15-22

The reason is that the module only considers integers. There’s no integer between 9 and 10, so the spans 2-9 and 10-12 can be merged into one span 2-12.

Instead of searching CPAN for a module that gives the right output, I decided to implement a solution myself. It turned out to be more complex than I thought.

As usually, I started with test cases, trying to capture all the possible arrangements of two or three intervals:

#!/usr/bin/perl
use warnings;
use strict;

use Test::More;

sub test {
    my ($init, $expected) = @_;
    my $i = 'MyInterval'->new;
    $i->insert(@$_) for @{ $init };
    is_deeply $i, $expected,
        join ', ', map do {local $" = ':'; "@$_"}, @$init;
}

test([[1,2]], [[1,2]]);
test([[1,2],[3,4]], [[1,2],[3,4]]);
test([[3,4],[1,2]], [[1,2],[3,4]]);
#                                 1 2 3 4 5 6
#                                   |-----|
test([[2,5],[1,2]], [[1,5]]);  #  |-|
test([[2,5],[1,3]], [[1,5]]);  #  |---|
test([[2,5],[1,5]], [[1,5]]);  #  |-------|
test([[2,5],[1,6]], [[1,6]]);  #  |---------|
test([[2,5],[2,3]], [[2,5]]);  #    |-|
test([[2,5],[2,5]], [[2,5]]);  #    |-----|
test([[2,5],[2,6]], [[2,6]]);  #    |------|
test([[2,5],[3,4]], [[2,5]]);  #      |-|
test([[2,5],[3,5]], [[2,5]]);  #      |---|
test([[2,5],[3,6]], [[2,6]]);  #      |-----|
test([[2,5],[5,6]], [[2,6]]);  #          |-|

test([[1,3],[5,7],[-1,0]], [[-1,0],[1,3],[5,7]]);
test([[1,3],[5,7],[-1,1]], [[-1,3],[5,7]]);
test([[1,3],[5,7],[-1,2]], [[-1,3],[5,7]]);
test([[1,3],[5,7],[-1,3]], [[-1,3],[5,7]]);
test([[1,3],[5,7],[-1,4]], [[-1,4],[5,7]]);
test([[1,3],[5,7],[-1,5]], [[-1,7]]);
test([[1,3],[5,7],[-1,6]], [[-1,7]]);
test([[1,3],[5,7],[-1,7]], [[-1,7]]);
test([[1,3],[5,7],[-1,8]], [[-1,8]]);

test([[1,3],[5,7],[1,2]], [[1,3],[5,7]]);
test([[1,3],[5,7],[1,3]], [[1,3],[5,7]]);
test([[1,3],[5,7],[1,4]], [[1,4],[5,7]]);
test([[1,3],[5,7],[1,5]], [[1,7]]);
test([[1,3],[5,7],[1,6]], [[1,7]]);
test([[1,3],[5,7],[1,7]], [[1,7]]);
test([[1,3],[5,7],[1,8]], [[1,8]]);

test([[1,3],[5,7],[2,2]], [[1,3],[5,7]]);
test([[1,3],[5,7],[2,3]], [[1,3],[5,7]]);
test([[1,3],[5,7],[2,4]], [[1,4],[5,7]]);
test([[1,3],[5,7],[2,5]], [[1,7]]);
test([[1,3],[5,7],[2,6]], [[1,7]]);
test([[1,3],[5,7],[2,7]], [[1,7]]);
test([[1,3],[5,7],[2,8]], [[1,8]]);

test([[1,3],[5,7],[3,3]], [[1,3],[5,7]]);
test([[1,3],[5,7],[3,4]], [[1,4],[5,7]]);
test([[1,3],[5,7],[3,5]], [[1,7]]);
test([[1,3],[5,7],[3,6]], [[1,7]]);
test([[1,3],[5,7],[3,7]], [[1,7]]);
test([[1,3],[5,7],[3,8]], [[1,8]]);

test([[1,3],[5,7],[4,4]], [[1,3],[4,4],[5,7]]);
test([[1,3],[5,7],[4,5]], [[1,3],[4,7]]);
test([[1,3],[5,7],[4,6]], [[1,3],[4,7]]);
test([[1,3],[5,7],[4,7]], [[1,3],[4,7]]);
test([[1,3],[5,7],[4,8]], [[1,3],[4,8]]);

test([[1,3],[5,7],[5,5]], [[1,3],[5,7]]);
test([[1,3],[5,7],[5,6]], [[1,3],[5,7]]);
test([[1,3],[5,7],[5,7]], [[1,3],[5,7]]);
test([[1,3],[5,7],[5,8]], [[1,3],[5,8]]);

test([[1,3],[5,7],[6,6]], [[1,3],[5,7]]);
test([[1,3],[5,7],[6,7]], [[1,3],[5,7]]);
test([[1,3],[5,7],[6,8]], [[1,3],[5,8]]);

test([[1,3],[5,7],[7,7]], [[1,3],[5,7]]);
test([[1,3],[5,7],[7,8]], [[1,3],[5,8]]);

test([[1,3],[5,7],[8,8]], [[1,3],[5,7],[8,8]]);

I then tried to implement the following idea: let’s keep the resulting intervals in an array, each interval is represented as a nested array of two elements. When merging a new interval, it should be possible to calculate the indices of the intervals to remove by splice. Unfortunately, there are lots of edge cases.

I proceeded in two steps. In the first step, we identify where the new interval starts and ends. The positions have three different representations: [in => 2] means the start or end lies inside the interval at index 2, [before => 3] means it doesn’t fall into any interval, but before the intervals at index 3; the third possibility is [after => 4] where index 4 corresponds to the last index in the array. For example, if we’re merging [4, 8] into

 [[1, 2], [3, 4], [6, 7], [9, 10]]
              ^----------^

the positions will be [in => 1] and [after => 2]. In the next step, this will be translated into changing 4 into 8 and removing the element [6, 7].

package MyInterval;

sub new { bless [], shift }

sub _where {
    my ($self, $n) = @_;
    my $pos;
    if ($n >= $self->[$#$self][0]) {
        $pos = $n > $self->[$#$self][1]
               ? [after => $#$self]
               : [in => $#$self];
    } else {
        $pos = (grep $n <= $self->[$_][1], 0 .. $#$self)[0];
        $pos = $self->[$pos][0] <= $n
             ? [in => $pos] : [before => $pos];
    }
    return $pos
}

sub insert {
    my ($self, $from, $to) = @_;
    unless (@$self) {
        @$self = ([$from, $to]);
        return
    }

    my $i = $self->_where($from);
    my $j = $self->_where($to);

    if ($i->[0] eq 'after') {
        push @$self, [$from, $to];

    } elsif ($i->[1] == $j->[1]) {
        if ('before' eq $j->[0]) {
            splice @$self, $j->[1], 0, [$from, $to];

        } else {
            $self->[ $j->[1] ][1] = $to   if 'after' eq $j->[0];
            $self->[ $i->[1] ][0] = $from if 'in' ne $i->[0];
        }

    } else {
        my ($x, $y) = ($i->[1], $j->[1] - ('before' eq $j->[0]));
        $self->[ $i->[1] ][1] = 'before' ne $j->[0]
            ? $self->[ $j->[1] ][1]
            : $to;
        splice @$self, $x + 1, $y - $x if $x < $y;

        $self->[$x][0] = $from if $i->[0] eq 'before';
        $self->[$x][1] = $to   if $j->[0] eq 'after';
    }
}

My original solution was a bit different and I wasn’t sure it worked correctly. So I decided to implement a different solution and compare the results.

We’ll represent the starts and ends of the intervals as keys in a hash. The values will represent the role of the key: when the key starts an interval, we’ll set its first bit; when it ends an interval, we’ll set its second bit; when it forms an interval of the form [$n, $n], we’ll set its third bit. For all the numbers inside an interval, we’ll set their first two bits.

When generating the output, we just sort the keys and skip those whose first two bits are set. The third bit needs handling only when we aren’t searching for an end of an interval.

package MyIntervalHash;

use enum 'BITMASK:' => qw( LEFT RIGHT SINGLE );

sub new { bless {}, shift }

sub insert {
    my ($self, $from, $to) = @_;
    $self->{$from} |= SINGLE, return if $from == $to;
    $self->{$from} |= LEFT;
    $self->{$_} = LEFT | RIGHT for $from + 1 .. $to - 1;
    $self->{$to} |= RIGHT;
}

sub out {
    my ($self) = @_;
    my @r;
    for my $k (sort { $a <=> $b } keys %$self) {
        if (($self->{$k} & (LEFT | RIGHT)) == LEFT) {
            push @r, [$k];
        } elsif (($self->{$k} & (LEFT | RIGHT)) == RIGHT) {
            push @{ $r[-1] }, $k
        } elsif ((! @r || 1 != @{ $r[-1] }) && ($self->{$k} == SINGLE)) {
            push @r, [$k, $k];
        }
    }
    return \@r
}

To compare the two algorithms, I wrote a simple program that generated random lists of intervals, ran both the algorithms and compared the outputs:

use Test::More;
use Test::Deep;
use Data::Dumper;
while (1) {
    my $l = 1 + int rand 12;
    my @intervals = map [sort {$a <=> $b} int rand 20, int rand 20], 1 .. $l;

    my $i1 = MyInterval->new;
    my $i2 = MyIntervalHash->new;
    $i1->insert(@$_), $i2->insert(@$_) for @intervals;

    warn Dumper $i1, $i2->out;

    cmp_deeply $i1, noclass($i2->out)
        or die Dumper \@intervals;
}

When there was a different result, the program stopped and I fixed one of the algorithms. It took several iterations of this process to finally fix both the algorithms, I added the failed cases to my test suite:

test([[1,2],[5,6],[3,4]], [[1,2],[3,4],[5,6]]);
test([[1,2],[5,6],[2,5]], [[1,6]]);
test([[1,1],[2,2],[3,3]], [[1,1],[2,2],[3,3]]);

test([[0,6],[7,8],[12,19],[3,8]],[[0,8],[12,19]]);
test([[12, 14], [15, 19], [7, 8], [1, 12]],
     [[1,14],[15,19]]);
test([[12, 17], [18, 18], [9, 9], [5, 17]],
     [[5,17],[18,18]]);

Plus, of course, the initial example:

test([[2, 7], [3, 9], [10, 12], [15, 19], [18, 22]],
     [[2, 9], [10, 12], [15, 22]]);

I also wanted to compare the performance of the algorithms. I ran the following benchmark:

use Time::HiRes qw{ gettimeofday tv_interval };

say STDERR 'Preparing data...';
my @meta_i = map {
    my $l = 1 + int rand 2000;
    [ map [sort {$a <=> $b} int rand 20, int rand 20], 1 .. $l]
} 1 .. 10_000;

say STDERR 'Benchmarking...';
for my $class (qw( MyInterval MyIntervalHash )) {
    my $t0 = [gettimeofday];
     for my $i (1 .. 10_000) {
        my @intervals = @{ $meta_i[$i-1] };
        my $I = $class->new;
        $I->insert(@$_) for @intervals;
        $I->out;
    }
    say $class, ' ', tv_interval($t0);
}

On my machine, the former algorithm took 25 seconds, while the latter took 15. Interestingly, the simpler algorithm was faster.

Noble Integer

You are given a list, @L, of three or more random integers between 1 and 50. A Noble Integer is an integer N in @L, such that there are exactly N integers greater than N in @L. Output any Noble Integer found in @L, or an empty list if none were found.

An interesting question is whether or not there can be multiple Noble Integers in a list.

Let’ start with the interesting question.

Let’s imagine there are two Noble Integers in a list, N1 and N2. If they aren't equal, we can assume N1 < N2. As N1 is noble, there are N1 numbers in the list greater than N1. N2 is greater than N1, so there must be less numbers greater than it in the list, but there should be N2 such numbers, because N2 is a Noble Integer. That’s a contradiction, so either there’s only one Noble Integer in any list, or there are several Noble Integers, but they are all equal.

Let’s start by sorting the input list in descending order. Then loop over the elements and check whether the number of greater numbers corresponds to the number itself. The number of greater numbers is 0 for the first element in the list (it’s the greatest number in the list, so there’s no greater number), we’ll increment it every time a number is different to the previous one (which ensures the algorithm handles repeated numbers correctly).

#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };

sub noble_integer {
    my @s = sort { $b <=> $a } @_;
    my $c = 0;
    my @noble;
    for my $i (0 .. $#s) {
        push @noble, $s[$i] if $c == $s[$i];
        ++$c if $s[$i] != ($s[$i + 1] // $s[$i] + 1);
    }
    return @noble
}

use Test::More;

is_deeply [noble_integer(2, 6, 1, 3)], [2];
is_deeply [noble_integer(2, 2, 6, 1, 3)], [2, 2];
is_deeply [noble_integer(0, 0, 0)], [0, 0, 0];

Leave a comment

About E. Choroba

user-pic I blog about Perl.