Perl weekly challenge 92

These are some answers to the Week 92 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in a few days (December 27, 2020). This blog post offers some solutions to this challenge, please don’t read on if you intend to complete the challenge on your own.

I'm not a great blogger - but I will try and explain my solutions to the Perl weekly challenge each week. I always try and look for interesting solutions to the problems at hand.

Part 1

I started with what I thought was the best approach - keeping a hash of the letters in both words and if they don't match the required pattern then to return 0 otherwise return 1. But this looked rather messy. There must be a simpler, more elegant solution.

I thought how would I compare the words, and "normalisation" sprang to mind:

Replacing the first letter you see with "a", the second with "b", etc. That means if the pattern of repeated letters are the same (isomorphic) you would get a similar pattern in this normalised words... Now how to make this work...

Well that is relatively simple:

  • Have a counter that starts with 'a' - Perl then treats this as a text counter and so ++ generates a sequence 'a', 'b', 'c', 'd', ....
  • We create a hash of these letters, key being the letter in the word, value being the "normalised letter". If the letter is a repeat, it will already be a key in the hash so we return that.
  • This leads to a very simple way of normalising a string

my ($x,%m) = 'a'; $normalised_string = join q(), map { $m{$_}||=$x++ } split m{}, $_;

We can embed this in a map to remove the need to write the code twice in the function, by mapping over the parameters in the function call. To get the normalised strings $a & $b see below.

It is then just a case of checking $a eq $b, which evaluates to 1 or "" and "or"ing with 0, makes sure the function returns 0 in the false case.

The function then becomes:

sub iso {
my ($a,$b) = map {
my ($x,%m)='a'; join q(), map { $m{$_}||=$x++ } split m{}, $_ } @_;
return $a eq $b || 0;
}

I decided to write another script that would work out the lists of all "isomorphic" words. This uses the normal form to group the words - and then print out those which have more than 1 word in the list.

I started with a list of 39944 words. 34342 are these are "isomorphic" of which 11,633 were words made entirely up of different letters.

Part 2

This was a much trickier proposition - but something I had come across before - with tiling variations, exons and BLAST hits on the human genome, and computing length of overlapping service from a database {with some really nasty MySQL code!}

This is simpler though as we know the regions we start with are all non-overlapping and in order.

  • If the new region doesn't overlap any of the regions we just push it into the list and return. Whether that be at the front in the middle or at the end.
  • We can test for the first two cases in the loop $new->{end} < $element->{start}, and then return the list of elements before $new, $new, $element and the rest of the list.
  • The last case is the end of the loop, and we just return the list with $new tacked on the end.
  • The problem comes when $new overlaps one of the regions. Firstly we work out the region containing both of the regions. Then we have to repeatedly see if this overlaps the next region and if it does repeat the process again. We then push this region onto the list.

sub int_insert {
my( $new, @list ) = @_;
my @new_list;
while(my $e = shift @list) {
return [ @new_list, $new, $e, @list ] if $e->[0] > $new->[1];
## The start of the next element is after the end of the new element -
## so we can safely push the new element and the rest of the list (and return it)
if( $e->[1] < $new->[0] ) { ## Next element is to the left of the new
push @new_list,$e; ## element so push and continue
next;
}
$new->[0] = $e->[0] if $e->[0] < $new->[0];
$new->[1] = $e->[1] if $e->[1] > $new->[1]; ## Get start/end of first overlap...
while( @list && $new->[1] >= $list[0][0] ) { ## this also overlaps the next list element
$new->[1] = $list[0][1] if $list[0][1] > $new->[1]; ## Update the end of the region if rqd
shift @list; ## Remove element from list;
}
return [ @new_list, $new, @list ]; ## The rest of the list will be after the "new" element now
} ## So we can safely push and return it....
return [ @new_list, $new ]; ## The new element must be after the list so we just
## return it on the end of the list...
}

The extension to this code to work out the overlap if you have two (or more) ascend non-overlapping sequences is the one that is needed to work out the overlapping transcript or mRNA/cDNA

The code is therefore slightly more complex:

The parameters are the list of arrays of intervals


sub int_merge {
my ($result,@rest) = @_; ## List of arrays of ascending non-overlapping regions
my @a = @{$result}; # First list.
foreach (@rest) {
my @b = @{$_};
my @new;
while( @a && @b ) { ## Stop when we get to one of the lists of intervals.
if( $a[0][1] < $b[0][0] ) { ## The first element of @a ends before the first element of @b
push @new, shift @a;
next;
}
if( $b[0][1] < $a[0][0] ) { ## The first element of @b ends before the first element of @a
push @new, shift @b;
next;
}
## We have an overlap;
## The new region starts as the overlap of the first elements of both lists
my $new_region = shift @a;
$new_region->[0] = $b[0][0] if $new_region->[0] > $b[0][0];
$new_region->[1] = $b[0][1] if $new_region->[1] < $b[0][1];
shift @b; ## New region is the overlap of the two regions
while( @a || @b ) { ## Now we look through the two lists for elements that
## overlap this new region....
## Note we only stop if both of the next to elements
## don't overlap the region OR we get to the end of
## both lists
if( @a && $a[0][0] <= $new_region->[1] ) { ## Element of first list overlaps -
$new_region->[1] = $a[0][1] if $new_region->[1] < $a[0][1]; ## extend region if required
shift @a; ## and remove element
next;
}
if( @b && $b[0][0] <= $new_region->[1] ) { ## Element of first list overlaps -
$new_region->[1] = $b[0][1] if $new_region->[1] < $b[0][1]; ## extend region if required
shift @b; ## and remove element
next;
}
last; ## Neither element overlaps so we finish this block
}
push @new, $new_region; # Add new region to list
}
push @new, @a, @b; ## We will have some of @a or @b left so push both on the answer
@a = @new; ## Copy the new list of regions to @a - and repeat if required
}
return \@a;
}


Leave a comment

About James Curtis-Smith

user-pic Perl developer for nearly 30 years now, mainly in maintenance scripts and web pages, using mod_perl. I also code a lot in PHP (20+yrs), and also extensive experience in HTML (27+yrs), Javascript(25+yrs), CSS (24+yrs) and SASS.