Perl Weekly Challenge 99: Pattern Match and Unique Subsequence

These are some answers to the Week 99 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in a couple of days (February 14, 2021). This blog post offers some solutions to this challenge, please don’t read on if you intend to complete the challenge on your own.

Task 1: Pattern Match

You are given a string $S and a pattern $P.

Write a script to check if given pattern validate the entire string. Print 1 if pass otherwise 0.

The patterns can also have the following characters: - ? - Match any single character. - * - Match any sequence of characters.

Example 1:

Input: $S = "abcde" $P = "a*e"
Output: 1

Example 2:

Input: $S = "abcde" $P = "a*d"
Output: 0

Example 3:

Input: $S = "abcde" $P = "?b*d"
Output: 0

Example 4:

Input: $S = "abcde" $P = "a*c?e"
Output: 1

So the pattern are similar to those used in the Unix ls command, with literal match for letters, digits and other characters common used in file names, and two wild-card characters, ? for any single character, and * for any sequence of characters.

Both in Raku and Perl, we will build a regex pattern by replacing * with the .* regex sequence, and ? with the . regex wild-card character. In addition, since we are requested to match the entire string, we will add the ^ start of string and $ end of string regex anchors.

Pattern Match in Raku

We could use regex substitutions to build the regex pattern (as we did below for the pattern match in Perl solution), but, here, the match subroutine loops through the input pattern characters and construct the regex pattern by hand. This subroutine finally applies regex matching to the input string.

use v6;

my $in = "abcde";
my @test-patterns = <a*e a*d ?b*d a*c?e>;
for @test-patterns -> $test {
    say "$test: ", match $test, $in;
}

sub match (Str $pattern, Str $in) {
    my $regex = 
        join "", gather {
            take '^';
            for $pattern.comb {
                when '*' { take '.*' }
                when '?' { take '.'  }
                default  { take $_   }
            }
            take '$';
    }
    return  $in ~~ /<$regex>/ ?? 1 !! 0;
}

With the “abcde” input string and the four input patterns, this program displays the following output:

$ raku pattern-match.raku
a*e: 1
a*d: 0
?b*d: 0
a*c?e: 1

Pattern Match in Perl

As mentioned above, I decided here to use regex substitutions to convert the input pattern into a regex pattern.

use strict;
use warnings;
use feature "say";

my $input = "abcde";
my @test_patterns = qw/a*e a*d ?b*d a*c?e/;
for my $pat (@test_patterns) {
    say "$pat: ", match($input, $pat)
}

sub match {
    my ($in, $pattern) = @_;
    $pattern =~ s/\*/.*/g;
    $pattern =~ s/\?/./g;
    $pattern = "^$pattern\$";
    return $in =~ /$pattern/ ? 1 : 0;
}

This program displays the same output as the Raku program above:

$ perl match-pattern.pl
a*e: 1
a*d: 0
?b*d: 0
a*c?e: 1

Task 2: Unique Subsequence

You are given two strings $S and $T.

Write a script to find out count of different unique subsequences matching $T without changing the position of characters.

UPDATE: 2021-02-08 09:00 AM (UK TIME) suggested by Jonas Berlin, missing entry [5].

Example 1:

Input: $S = "littleit', $T = 'lit'
Output: 5

    1: [lit] tleit
    2: [li] t [t] leit
    3: [li] ttlei [t]
    4: litt [l] e [it]
    5: [l] ittle [it]

Example 2:

Input: $S = "london', $T = 'lon'
Output: 3

    1: [lon] don
    2: [lo] ndo [n]
    3: [l] ond [on]

Dear Mohammad, when you do such updates, would you please be kind enough to send an e-mail informing us of such change. I had loaded the web page before the update, and spent a couple of hours trying to fix my program against your solution until I found that, in fact, my program was correct and your initial solution wrong. Only at that point did I think about re-loading the page and found that you had fixed the solution on the Web page. Every one can make mistakes, I have no problem with that, but please inform us when this happens.

Unique Subsequence in Raku

Although I admit that there may be some more efficient solution in terms of speed performance, I felt that using brute force with the combinations built-in method was better in terms of coding efficiency. The program generates all the input letters combinations having the size of the searched substring, filters out those not matching the input substring, and finally returns the number of matching substrings.

use v6;

my @input-tests = [ "littleit", "lit"], ["london", "lon"];

for @input-tests -> $test {
    my ($in, $substr) = $test[0..1];
    say "$test: ", search-substr $in, $substr;
}
sub search-substr (Str $in, Str $substr) {
    my @results = $in.comb.combinations($substr.\
        chars)>>.join("").grep({$_ eq $substr}).elems;
}

All the real work is done in a single code line (well, formatted here over two lines for a better graphical rendering on this blog page).

This program displays the following output:

$ ./raku subsequence.raku
littleit lit: [5]
london lon: [3]

Unique Subsequence in Perl

The solution in Perl is completely different from the Raku solution. Here, we use the search_substr recursive subroutine to explore all the possible substrings. This program could generate all the letter combinations as the Raku program, except that it cuts the process for any letter combination that will never eventually match the search substring, so that the program is doing much less useless work. Presumably this should make the program faster, especially for longer input strings and substrings.

use strict;
use warnings;
use feature "say";

my @input_tests = ( [ "littleit", "lit"], ["london", "lon"], ["aaaa", "aa"]);
my $count;
for my $in (@input_tests) {
    $count = 0;
    search_substr (@$in);
    say "@$in: $count";
}
sub search_substr {
    my ($in, $searched) = @_; 
    my $start = substr $searched, 0, 1;
    my $index = 0;
    while (1) {
        $index = index $in, $start, $index;
        return if $index < 0;
        $index++;
        ++$count and next  if length $searched == 1;
        search_substr (substr($in, $index), substr($searched, 1));      
    }
}

This programs displays the following output:

$ perl subsequence.pl
littleit lit: 5
london lon: 3
aaaa aa: 6

Wrapping up

The next week Perl Weekly Challenge will start soon and will be the 100th challenge. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, February 21, 2021. And, please, also spread the word about the Perl Weekly Challenge if you can.

Leave a comment

About laurent_r

user-pic I am the author of the "Think Perl 6" book (O'Reilly, 2017) and I blog about the Perl 5 and Raku programming languages.