Perl Weekly Challenge 92: Isomorphic Strings and Insert Intervals

These are some answers to the Week 92 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in a few hours. This blog post offers some solutions to this challenge, please don’t read on if you intend to complete the challenge on your own.

Task 1: Isomorphic Strings

You are given two strings $A and $B.

Write a script to check if the given strings are Isomorphic. Print 1 if they are, otherwise 0.

Example 1:

Input: $A = "abc"; $B = "xyz"
Output: 1

Example 2:

Input: $A = "abb"; $B = "xyy"
Output: 1

Example 3:

Input: $A = "sum"; $B = "add"
Output: 0

Isomorphic Strings in Raku

For this task, we’re using 5 test cases. For each string pair, the program calls the is-isomorphic subroutine, which does the bulk of the work.

The first thing that the is-isomorphic subroutine does is to compare the string lengths and return 0 if they are not equal. This may appear to be a performance enhancement (and, it is indeed), but, more importantly, this simplifies the rest of the procedure since, from there on, we know that the strings we are processing have the same character count. Then, we use the %transcode hash to record the translation table from the first string to the second one. We return 0 if any character in the first string would need to be translated into two different characters in the second one. This is not sufficient, though: for example, the last example in the task description (“sum” and “add”) would pass this test, although the words are not isomorphic. So we also use the %seen SetHash to record the letters of the second string that we have already seen and return 0 if any pair of characters required a new entry in the %transcode hash whereas the target letter has already been seen.

use v6;
my @tests = (< abc xyz >), (< abb xyy >), (< sum add >),
    (< ACAB XCXY >), (< abc uvwxy >);

    for @tests -> @strings {
        say "@strings[]: ", is-isomorphic @strings;
    }

sub is-isomorphic (@strings) {
    my ($str1, $str2) = @strings;
    return 0 if $str1.chars != $str2.chars;
    my %transcode;
    my SetHash $seen;
    for 0..$str1.chars - 1 -> $i {
        my $char1 = $str1.substr($i, 1);
        my $char2 = $str2.substr($i, 1);
        if %transcode{$char1}:exists {
            return 0 if %transcode{$char1} ne $char2;
        } else {
            return 0 if $seen{$char2};
            %transcode{$char1} = $char2;
            $seen{$char2}++;;
        }
    }
    return 1
}

This script displays the following output:

$ raku isomorphic.raku
abc xyz: 1
abb xyy: 1
sum add: 0
ACAB XCXY: 1
abc uvwxy: 0

Isomorphic Strings in Perl

This is essentially the same code as before in Perl. Please refer to the explanations above if needed. The only significant difference is that, since there is no SetHash in Perl, we use the plain hash %seen.

use strict;
use warnings;
use feature "say";

my @tests = ( [qw< abc xyz >], [qw< abb xyy >], [qw< sum add >],
    [qw< ACAB XCXY >], [qw< abc uvwxy >] );

    for my $strings_ref (@tests) {
        say "@$strings_ref: ", is_isomorphic(@$strings_ref);
    }

sub is_isomorphic {
    my ($str1, $str2) = @_;
    return 0 if length $str1 != length $str2;
    my (%transcode, %seen);
    for my $i (0..length($str1) - 1) {
        my $char1 = substr $str1, $i, 1;
        my $char2 = substr $str2, $i, 1;;
        if (exists $transcode{$char1}) {
            return 0 if $transcode{$char1} ne $char2;
        } else {
            return 0 if exists $seen{$char2};
            $transcode{$char1} = $char2;
            $seen{$char2} = 1;
        }
    }
    return 1
}

Output:

$ perl isomorphic.pl
abc xyz: 1
abb xyy: 1
sum add: 0
ACAB XCXY: 1
abc uvwxy: 0

Isomorphic Strings in Scala

This is essentially a port to Scala of the above two programs in Raku and Perl. See the explanations above if needed. Perl’s and Raku’s hashes are called Maps in Scala. Also, Sets are usually not mutable, so we need to import scala.collection.mutable.Set.

import scala.collection.mutable.Set

object Main {
  def main(args: Array[String]): Unit = {
    val (str1, str2) =
      if (args.size == 2) (args(0), args(1)) else ("abc", "xyz")
    println(s"args = $str1 and $str2")
    println(isIsomorphic(str1, str2))
  }
  def isIsomorphic(str1: String, str2: String): Int = {
    if (str1.length != str2.length) return 0
    var transcode: Map[Char, Char] = Map()
    var seen: Set[Char] = Set()
    for (i <- 0 to str1.length - 1) {
      if (transcode.contains(str1(i))) {
        if (str1(i) != str2(i)) { return 0 }
      } else {
        if (seen.contains(str2(i))) return 0
        transcode += (str1(i) -> str2(i))
        seen += str2(i)
      }
    }
    return 1
  }
}

Sample output:

args = abc and xyz
1

Task 2: Insert Intervals

You are given a set of sorted non-overlapping intervals and a new interval.

Write a script to merge the new interval to the given set of intervals.

Example 1:

Input $S = (1,4), (8,10); $N = (2,6)
Output: (1,6), (8,10)

Example 2:

Input $S = (1,2), (3,7), (8,10); $N = (5,8)
Output: (1,2), (3,10)

Example 3:

Input $S = (1,5), (7,9); $N = (10,11)
Output: (1,5), (7,9), (10,11)

On Non-Overlapping and Adjacent Intervals

When you create or use such intervals, in probably more than 90% of the use cases, you generally also want to avoid adjacent intervals, because the aim is usually to summarize an integer (or alphabetic or other) sequence as much as possible. From that standpoint, the input set of the second example would not be (1,2), (3,7), (8,10), but (1,10).

My first Raku attempt merged adjacent intervals. It simply expanded all intervals, sorted the data and recreated intervals:

use v6;

my @tests = [(2, 6), (1, 4), (8, 10)], [(5, 8), (1, 2), (3,7 ), (8, 10)], [(10, 11), (1, 5), (7, 9)];

# Caution: not complying with the task rules, as this
# script also merges adjacent intervals
for @tests -> @test {
    say @test;
    my @full-data = (map { |( $_[0]..$_[1]) }, @test).Set.keys.sort;
    my $first = shift @full-data;
    my @result;
    my $last = $first;
    for @full-data -> $item {
        if $item != $last + 1 {
           push @result, "($first, $last), ";
           $first = $item;
        }
        $last = $item;
    }
    push @result, "($first, $last), ";
    say ~@result.subst(/','\s+$/, "\n");
}

This produces the following output:

$ raku interval_1.raku
[(2 6) (1 4) (8 10)]
(1, 6),  (8, 10)

[(5 8) (1 2) (3 7) (8 10)]
(1, 10)

[(10 11) (1 5) (7 9)]
(1, 5),  (7, 11)

As noted before, to me, this is more in line with what you usually want to do with such intervals. But, the task specification says “non-overlapping”, it doesn’t mention “non-adjacent” intervals and the examples are quite clear, so fair enough, my solution above is wrong. It’s a pity, since this is really simpler. A correct solution will be given below.

On the Size of the Input Data

The examples provided are very small, with only two or three intervals. This led me to an algorithm that sorts again the input (together with the additional interval) and then loops over the contents to merge overlapping intervals.

This, of course, wouldn’t be very efficient with large input data. A friend of mine once asked my help to deal with a somewhat similar problem, but involving very large data sets in the field of genetics. His program stored millions of non-overlapping ranges in a hash and was then trying to figure out whether any of several hundreds of thousands regions (effectively, also ranges) from another file overlapped with the ranges in his hash. This program essentially scanned sequentially the full hash for each region in the input file, and it would take days to complete. I suggested that, rather than using a hash for the input ranges, it would be better to use a sorted array and to implement a binary search. With dummy data (one million ranges), I showed that the search part of the program would run more than 20,000 times faster with binary search. Of course, his program was doing a number of other things, so, overall, that performance enhancement on a sample of real data was diluted to a factor of 35 (run time from 28 hours down to 50 minutes). For details, see the second part of the slides for a talk I gave on the subject at the 2018 German Perl Workshop in Cologne on this subject.

With input data of three or four ranges, it would be technological overkill to implement a binary search.

Insert Intervals in Raku

In my test range sets, the first interval is the one to be added and the rest is the input set of ranges. Thus, it would be easy to shift the first range and to place it in the proper position in the rest of the set. That’s not what I did, however, since, as noted above, the program is simply sorting the ranges (in accordance with the start of the range) and then proceeds to fix the overlapping ranges.

my @tests = [(2, 6), (1, 4), (8, 10)], [(5, 8), (1, 2), (3,7 ), (8, 10)], [(10, 11), (1, 5), (7, 9)];

for @tests <-> @test {
    say @test;
    my @sorted = sort { $_[0] }, @test;
    say @sorted;
    my @result;
    my $first = shift @sorted;
    my ($start, $end) = $first[0, 1];
    for @sorted -> $item {
        # say "[$item] $start $end ", @result;
        if $item[0] > $end {
            push @result, [$start, $end];
            # say "Temp res = ", @result;
            ($start, $end) = $item;
        } else {
            $end = $item[1];
        }
    }
    push @result, [$start, $end] unless $start == @result[*-1][0];
    say "result = ", @result, "\n";
}

This displays the following output:

$ raku interval_2.raku
[(2 6) (1 4) (8 10)]
[(1 4) (2 6) (8 10)]
result = [[1 6] [8 10]]

[(5 8) (1 2) (3 7) (8 10)]
[(1 2) (3 7) (5 8) (8 10)]
result = [[1 2] [3 10]]

[(10 11) (1 5) (7 9)]
[(1 5) (7 9) (10 11)]
result = [[1 5] [7 9] [10 11]]

Insert Intervals in Perl

This is essentially a port of the above (second) Raku program to Perl:

use strict;
use warnings;
use feature "say";
use Data::Dumper;

my @tests = ([[2, 6], [1, 4], [8, 10]], [[5, 8], [1, 2], [3,7 ], [8, 10]], [[10, 11], [1, 5], [7, 9]]);

for my $test (@tests) {
    say join " ", map { "[ @$_ ]" } @$test;
    my @sorted = sort { $a->[0] <=> $b->[0] } @$test;
    say join " ", map { "[ @$_ ]" } @sorted;
    my @result;
    my $first = shift @sorted;
    my ($start, $end) = @$first;
    for my $item (@sorted) {
        # say "[$item] $start $end ", @result;
        if ($item->[0] > $end) {
            push @result, [$start, $end];
            # say "Temp res = ", @result;
            ($start, $end) = @$item;
        } else {
            $end = $item->[1];
        }
    }
    push @result, [$start, $end] unless $start == $result[-1][0];
    say "result = ", join (" ", map { "[ @$_ ]" } @result), "\n";
}

This displays essentially the same output as the Raku program:

$ perl interval.pl
[ 2 6 ] [ 1 4 ] [ 8 10 ]
[ 1 4 ] [ 2 6 ] [ 8 10 ]
result = [ 1 6 ] [ 8 10 ]

[ 5 8 ] [ 1 2 ] [ 3 7 ] [ 8 10 ]
[ 1 2 ] [ 3 7 ] [ 5 8 ] [ 8 10 ]
result = [ 1 2 ] [ 3 10 ]

[ 10 11 ] [ 1 5 ] [ 7 9 ]
[ 1 5 ] [ 7 9 ] [ 10 11 ]
result = [ 1 5 ] [ 7 9 ] [ 10 11 ]

Insert Intervals in Scala

This a port of the above Raku and Perl programs to Scala:

import Array._
object insertIntervals extends App {
  val tests =
    Array(
      Array(Array(2, 6), Array(1, 4), Array(8, 10)),
      Array(Array(5, 8), Array(1, 2), Array(3, 7)),
      Array(Array(10, 11), Array(1, 5), Array(7, 9))
    )

  for (test <- tests) {
    printArray(test)
    val sorted = test.sortWith(_(0) < _(0))
    printArray(sorted)
    var result = Array.empty[Array[Int]]
    val first = sorted(0)
    var start = first(0)
    var end = first(1)
    for (item <- sorted) {
      if (item(0) > end) {
        result = result :+ Array(start, end)
        start = item(0)
      }
      end = item(1)
    }
    result = result :+ Array(start, end)
    print("Result: ")
    printArray(result)
    println(" ")
  }
  def printArray(input: Array[Array[Int]]): Unit = {
    for (item <- input) {
      print(s"( ${item(0)} ${item(1)} ) ") 
    }
    println()
  }
}

Output:

( 2 6 ) ( 1 4 ) ( 8 10 ) 
( 1 4 ) ( 2 6 ) ( 8 10 ) 
Result: ( 1 6 ) ( 8 10 ) 

( 5 8 ) ( 1 2 ) ( 3 7 ) 
( 1 2 ) ( 3 7 ) ( 5 8 ) 
Result: ( 1 2 ) ( 3 8 ) 

( 10 11 ) ( 1 5 ) ( 7 9 ) 
( 1 5 ) ( 7 9 ) ( 10 11 ) 
Result: ( 1 5 ) ( 7 9 ) ( 10 11 )

Wrapping up

The next week Perl Weekly Challenge will start soon and will end next year, a year that we all hope will be better than this year. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, January 3, 2021. And, please, also spread the word about the Perl Weekly Challenge if you can.

Leave a comment

About laurent_r

user-pic I am the author of the "Think Perl 6" book (O'Reilly, 2017) and I blog about the Perl 5 and Raku programming languages.