Perl weekly challenge 99

Here are solutions to this weeks challenges from the Perl Weekly Challenge.

You can find my full code on Github

Challenge 1

You are given a string $S and a pattern $P.

Write a script to check if given pattern validate the entire string. Print 1 if pass otherwise 0.

The patterns can also have the following characters:

  • ? - Match any single character.
  • * - Match any sequence of characters.

Example 1:

Input:  $S = "abcde" $P = "a*e"
Output: 1

Solution

This challenge is relatively simple - converting "file name" wildcards into perl regular expressions. the "*" wildcard is the same as ".*" in perl, "?" is the same as "." in perl.

So we replace them in the regex (and remembering we are tied to the ends of the string).


  my $regex  = '\A' . ( $pattern =~ s{[*]}{.*}r =~ s{[.]}{?}r ).'\Z';

A few notes about generating the regex - First we use the r modifier to return the value of the regex rather than editing in place - secondly we use \A & \Z just out of habit rather than ^ & $.

Challenge 2

You are given two strings $S and $T.

Write a script to find out count of different unique subsequences matching $T without changing the position of characters.

Example 1

Input: $S = "littleit', $T = 'lit'
Output: 5

    1: [lit] tleit
    2: [li] t [t] leit
    3: [li] ttlei [t]
    4: litt [l] e [it]
    5: [l] ittle [it]

Solution

This week's second problem was harder (especially as I thought my code wasn't working as this example was wrong)....

I'll outline the 3 solution functions that I created - two work out the number of sequences (one with memoization), and then a third actually returns the sequences.

The approach to all 3 is to split the string into chunks up until the occurrence of the current first letter of $T.

sub uniq_subseq {
  my( $result, $haystack, $first, $needle )
    = ( 0, $_[0], $_[1] =~ m{(.)(.*)} );
  return scalar @{[ $haystack =~ m{$first}g ]}
    if $needle eq q();
  $result += uniq_subseq( $haystack, $needle )
    while $haystack=~s{.*?$first}{};
  return $result;
}

A bit of a nasty initiliaser - to initialise all the variables we need


  • $result - the count of matches

  • $haystack - the string we are searching in

  • $first - the first character of the needle

  • $needle - the rest of the needle

We need to a regex to split the needle into the two parts!

We then split the code ( a slight optimization ) - if we have reached the end of the needle - then we just need to count the occurances of the needle left {saves some function calls}

We then just strip of the characters of the string (haystack) up to each occurance of our search letter - and use recursion to call the function again - but this time with the remainder of the string (and the shortened needle), and them up and that is our value to return.

Our second function just adds memoization, by using a cache (using state variables inside the function) to remember the results of calling uniq_subseq as, as the string gets longer you will end up repeating this call over and over again.

Note: the second line of the function is used to clear the cache for performance testing {repeatedly running the method for benchmarking will always return the cached value on subsequent calls. We have to do it with a special call of the function as we are using a state variables for the cache - and this is only accessible inside the function itself.

For the simple examples we don't see much gain from the cache - but if you find a more complex problem... with lots of solutions this function works much better - the example we tested showed a speed gain of around 100x.

sub uniq_subseq_cache {
  state $cache = {};
  return $cache={} if $_[0] eq '---'; ## Clear the cache to examine speed
                                      ## Can't clear state cookie from
                                      ## outside function....

  my( $result, $cache_key, $haystack, $first, $needle )
    = ( 0, "$_[0]-$_[1]", $_[0], $_[1] =~ m{(.)(.*)} );

  return $cache->{$cache_key}
    if exists $cache->{$cache_key};
  return $cache->{$cache_key} = scalar @{[ $haystack =~ m{$first}g ]}
    if $needle eq q();
  $result += uniq_subseq_cache( $haystack, $needle )
    while $haystack=~s{.*?$first}{};
  return $cache->{$cache_key} = $result;
}

The final function returns the "annotated" string showing each solution. We use a similar stripping approach - but it adds some complexity - mainly in remembering what we have stripped off the string - while we look for the parts of the needle. See code for comments.

sub display_uniq_subseq {
  my( $haystack, $prev, $regexp, $needle, @result ) = (
    $_[0],          ## haystack (first string)
    @_>2?$_[2]:q(), ## previous string (3rd parameter if it exists)
    $_[1] =~ m{(.)(.*)} ? ('\A(.*?)('.$1.')',$2) : (q(),q()),
    ## The regex for finding matches + the remainder of needle
    ## Slightly more complex than the previous version as we
    ## remove the "optimization" step in the other two functions
  );

  ## If we have exhausted the substring we return the previous part
  ## along with what is left of the haystack.
  ## Note individual mapped letters are surrounded by individual
  ## brackets - to collapse these down to clusters of matched
  ## characters - We collapse adjacent []s by stripping "][".
  ## We again use the "r" modifier to just return the result
  ## of the replacement.
  return ($prev =~s{\]\[}{}gr).$haystack if $regexp eq q();

  ## regex collects anything before the matched letter &
  ## the matched letter
  while( $haystack =~ s{$regexp}{} ) {
    my( $pre_match, $match ) = ($1,$2);
    push @result, display_uniq_subseq(
      $haystack, $needle, $prev.$pre_match.'['.$match.']',
    );
    ## add the match onto the previous string, and
    ## continue to the next match
    $prev .= $pre_match.$match;
  }
  return @result;
}

and without comments:

sub display_uniq_subseq {
  my( $haystack, $prev, $regexp, $needle, @result ) = (
    $_[0], @_>2?$_[2]:q(),
    $_[1] =~ m{(.)(.*)} ? ('\A(.*?)('.$1.')',$2) : (q(),q()) );

  return ($prev =~s{\]\[}{}gr).$haystack if $regexp eq q();

  while( $haystack =~ s{$regexp}{} ) {
    my( $pre_match, $match ) = ($1,$2);
    push @result, display_uniq_subseq(
      $haystack, $needle, $prev.$pre_match.'['.$match.']' );
    $prev .= $pre_match.$match;
  }
  return @result;
}

Leave a comment

About James Curtis-Smith

user-pic Perl developer for nearly 30 years now, mainly in maintenance scripts and web pages, using mod_perl.