Perl Weekly Challenge 247: Most Frequent Letter Pair

These are some answers to the Week 247, Task 2, of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in a few days from now (on December 17, 2023, at 23:59). This blog post provides some solutions to this challenge. Please don’t read on if you intend to complete the challenge on your own.

Task 2: Most Frequent Letter Pair

You are given a string S of lower case letters 'a'..'z'.

Write a script that finds the pair of consecutive letters in S that appears most frequently. If there is more than one such pair, chose the one that is the lexicographically first.

Example 1

Input: $s = 'abcdbca'
Output: 'bc'

'bc' appears twice in `$s`

Example 2

Input: $s = 'cdeabeabfcdfabgcd'
Output: 'ab'

'ab' and 'cd' both appear three times in $s and 'ab' is lexicographically smaller than 'cd'.

Most Frequent Letter Pair in Raku

We first split the input string into an array of letters, build all the possible pairs of consecutive letters, and store the pairs and their frequency in a hash. Once this is completed, we sort the %pairs hash in descending order of their frequencies and, when the frequency is equal, in alphabetic order, and return the first item of the sorted list.

sub most-frequent-pair ($str) {
    my %pairs;
    my @letters = $str.comb;
    for 1..@letters.end -> $i {
        my $pair = @letters[$i-1] ~ @letters[$i];
        %pairs{$pair}++;
    }
    return (sort { %pairs{$^b} <=> %pairs{$^a} || 
                $^a leg $^b }, %pairs.keys)[0];
}

for 'abcdbca', 'cdeabeabfcdfabgcd', 'bcabbc' -> $test {
    printf "%-20s => ", $test;
    say most-frequent-pair $test;
}

This program displays the following output:

$ raku ./most-frequent-pair.raku
abcdbca              => bc
cdeabeabfcdfabgcd    => ab
bcabbc               => bc

Most Frequent Letter Pair in Perl

This is a port to Perl of the above Raku program. Please refer to the above section if you need further explanations on how it works.

use strict;
use warnings;
use feature 'say';

sub most_frequent_pair {
    my %pairs;
    my @letters = split //, shift;
    for my $i (1..$#letters) {
        my $pair = $letters[$i-1] . $letters[$i];
        $pairs{$pair}++;
    }
    return (sort { $pairs{$b} <=> $pairs{$a} || 
                $a cmp $b } keys %pairs)[0];
}

for my $test ('abcdbca', 'cdeabeabfcdfabgcd', 'bcabbc') {
    printf "%-20s => ", $test;
    say most_frequent_pair $test;
}

This program displays the following output:

$ perl ./most-frequent-pair.pl
abcdbca              => bc
cdeabeabfcdfabgcd    => ab
bcabbc               => bc

Wrapping up

The next week Perl Weekly Challenge will start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on December 24, 2023. And, please, also spread the word about the Perl Weekly Challenge if you can.

Leave a comment

About laurent_r

user-pic I am the author of the "Think Perl 6" book (O'Reilly, 2017) and I blog about the Perl 5 and Raku programming languages.