Perl Weekly Challenge W024 - Smallest Script, Inverted Index

I've been doing the Perl Weekly Challenge (PWC) for 3 weeks now. So far there's been unique challenges that made me utilize different modules. I even submitted a solution using APIs which I haven't done in my work because I didn't have any reason to. (lol)

If you'd like to join the fun and contribute, please visit the site link managed by Mohammad S Anwar.

Task #1 - Smallest Script:
The tasks for this week's challenge (#24) are a bit confusing at first but I just did what was asked. The first task was to create the smallest script as described below:

Create a smallest script in terms of size that on execution doesn’t throw any error. The script doesn’t have to do anything special. You could even come up with smallest one-liner.

There is no problem to solve, so in my entry I just put a $%:

perl -e '$%'

The code did not throw any error even when I tried an empty script. I hope next week would be a golfing challenge at least. And that was Task #1, moving on to

Task #2 - Inverted Index:
I honestly haven't heard of the term. Good thing the task includes the Wiki description

Create a script to implement full text search functionality using Inverted Index. According to wikipedia:

In computer science, an inverted index (also referred to as a postings file or inverted file) is a database index storing a mapping from content, such as words or numbers, to its locations in a table, or in a document or a set of documents (named in contrast to a forward index, which maps from documents to content). The purpose of an inverted index is to allow fast full-text searches, at a cost of increased processing when a document is added to the database.

In my solution, I used hash to associate word with the files it was found. It follows the structure:

%index{$word}{$file}

The hash would be later on used to print out the location/file(s) where the search keys can be found.

Solution:

use strict;
use warnings;
use 5.010;

die "Usage:\n\tch-2.pl \n\nExample:\n\tch-2.pl \"i sing eat and love\" file1.txt file2.txt\n\n" if @ARGV < 2;

#Set the minimum length of words to be included
my $minimum_length = 1;

# Get the words to search. Convert it to lowercase.
# Use as keys in a hash to get only unique words
my %hash_words = map { lc $_ => 1 } shift=~/([^, ]+) *,?/g;
# Retrieve the keys and store in @words
my @words = keys %hash_words;

# Create index from files
my @files = @ARGV;
my %index;
for my $file (@files) {
open(my $fh, "<", $file);
while(<$fh>) {
for my $w (grep { y///c >=$minimum_length } /(\w+) ?/g) {
$index{lc $w}{lc $file=~s/^\.\\//r}++
}
}
close($fh);
}

use Text::Table::Tiny 'generate_table';
sub print_search_result {
#Show search result
my $rows;
push @{$rows} , ["Words","File(s)"];
for my $w (sort @words) {
my $sub;
for my $f (sort keys %{$index{$w}}) {
$sub .= "$f ";
}
push @{$rows} , [$w,$sub||"(N/A)"];
}
print generate_table(rows => $rows, header_row => 1);
}

sub show_index {
#Show File Index
my $rows;
push @{$rows} , ["Words","File(s)"];
for my $w (sort keys %index) {
my $sub;
for my $f (sort keys %{$index{$w}}) {
$sub .= "$f ";
}
push @{$rows} , [$w,$sub];
}
print generate_table(rows => $rows, header_row => 1);
}

&print_search_result;




I used the Text::Table::Tiny module to print out a nice table. Output should look something like this:
perl .\ch-2.pl "i sing eat and love" .\file1.txt .\file2.txt .\file3.txt .\file4.txt .\file5.txt
+-------+--------------------------------+
| Words | File(s) |
+-------+--------------------------------+
| and | file1.txt file2.txt |
| eat | file4.txt |
| i | file1.txt file2.txt file4.txt |
| love | file2.txt file5.txt |
| sing | (N/A) |
+-------+--------------------------------+

Leave a comment

About Yet Ebreo

user-pic I blog about Perl.