Perl Weekly Challenge 255: Most Frequent Word

These are some answers to the Week 255, Task 2, of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in a few days from now (on February 11, 2024, at 23:59). This blog post provides some solutions to this challenge. Please don’t read on if you intend to complete the challenge on your own.

Task 2: Most Frequent Word

You are given a paragraph $p and a banned word $w.

Write a script to return the most frequent word that is not banned.

Example 1

Input: $p = "Joe hit a ball, the hit ball flew far after it was hit."
       $w = "hit"
Output: "ball"

The banned word "hit" occurs 3 times.
The other word "ball" occurs 2 times.

Example 2

Input: $p = "Perl and Raku belong to the same family. Perl is the most popular language in the weekly challenge."
       $w = "the"
Output: "Perl"

The banned word "the" occurs 3 times.
The other word "Perl" occurs 2 times.

Most Frequent Word in Raku

We first use the tr/// in-place transliteration operator to remove punctuation characters from the input paragraph, which makes it possible to use the words to split the paragraph into words. We then use grep to remove the banned word from the word list and convert the resulting list into a Bag, histo (for histogram). Finally, we return the item from the bag having the highest frequency.

sub most-frequent-word ($para is copy, $banned) {
    $para ~~ tr/,.:;?!//;
    my $histo = $para.words.grep({$_ ne $banned}).Bag;
    return $histo.keys.max({$histo{$_}});
}

my $t = "Joe hit a ball, the hit ball flew far after it was hit.";
printf "%-30s... => ", substr $t, 0, 28;
say most-frequent-word $t, "hit";

$t = "Perl and Raku belong to the same family. Perl is the most popular language in the weekly challenge.";
printf "%-30s... => ", substr $t, 0, 28;
say most-frequent-word $t, "the";

This program displays the following output:

$ raku ./most-frequent-word.raku
Joe hit a ball, the hit ball  ... => ball
Perl and Raku belong to the   ... => Perl

Most Frequent Word in Perl

This is a port to Perl of the Raku program above, using a hash instead of a Bag and the split function instead of words.

use strict;
use warnings;
use feature 'say';

sub most_frequent_word {
    my ($para, $banned) = @_;
    $para =~ tr/,.:;?!//;
    my %histo;
    %histo = map { $_ => ++$histo{$_} } 
        grep {$_ ne $banned} split /\W/, $para;
    return (sort { $histo{$b} <=> $histo{$a} } keys %histo )[0];
}

my $t = "Joe hit a ball, the hit ball flew far after it was hit.";
printf "%-30s... => ", substr $t, 0, 28;
say most_frequent_word $t, "hit";

$t = "Perl and Raku belong to the same family. Perl is the most popular language in the weekly challenge.";
printf "%-30s... => ", substr $t, 0, 28;
say most_frequent_word $t, "the";

This program displays the following output:

$ perl ./most-frequent-word.pl
Joe hit a ball, the hit ball  ... => ball
Perl and Raku belong to the   ... => Perl

Wrapping up

The next week Perl Weekly Challenge will start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on February 18, 2024. And, please, also spread the word about the Perl Weekly Challenge if you can.

Leave a comment

About laurent_r

user-pic I am the author of the "Think Perl 6" book (O'Reilly, 2017) and I blog about the Perl 5 and Raku programming languages.