Perl Weekly Challenge W024 - Smallest Script, Inverted Index

I've been doing the Perl Weekly Challenge (PWC) for 3 weeks now. So far there's been unique challenges that made me utilize different modules. I even submitted a solution using APIs which I haven't done in my work because I didn't have any reason to. (lol)

If you'd like to join the fun and contribute, please visit the site link managed by Mohammad S Anwar.

Task #1 - Smallest Script:
The tasks for this week's challenge (#24) are a bit confusing at first but I just did what was asked. The first task was to create the smallest script as described below:

Create a smallest script in terms of size that on execution doesn’t throw any error. The script doesn’t have to do anything special. You could even come up with smallest one-liner.

There is no problem to solve, so in my entry I just put a $%:
perl -e '$%'

The code did not throw any error even when I tried an empty script. I hope next week would be a golfing challenge at least. And that was Task #1, moving on to

Task #2 - Inverted Index:
I honestly haven't heard of the term. Good thing the task includes the Wiki description

Create a script to implement full text search functionality using Inverted Index. According to wikipedia:

In computer science, an inverted index (also referred to as a postings file or inverted file) is a database index storing a mapping from content, such as words or numbers, to its locations in a table, or in a document or a set of documents (named in contrast to a forward index, which maps from documents to content). The purpose of an inverted index is to allow fast full-text searches, at a cost of increased processing when a document is added to the database.

In my solution, I used hash to associate word with the files it was found. It follows the structure:
%index{$word}{$file}

The hash would be later on used to print out the location/file(s) where the search keys can be found.

Solution:
use strict;
use warnings;
use 5.010;

die "Usage:\n\tch-2.pl   \n\nExample:\n\tch-2.pl \"i sing eat and love\" file1.txt file2.txt\n\n" if @ARGV < 2;

#Set the minimum length of words to be included
my $minimum_length = 1;

# Get the words to search. Convert it to lowercase. 
# Use as keys in a hash to get only unique words
my %hash_words = map { lc $_ => 1 } shift=~/([^, ]+) *,?/g;
# Retrieve the keys and store in @words
my @words = keys %hash_words;

# Create index from files
my @files = @ARGV;
my %index;
for my $file (@files) {
    open(my $fh, "<", $file);
    while(<$fh>) {
        for my $w (grep { y///c >=$minimum_length } /(\w+) ?/g) {
            $index{lc $w}{lc $file=~s/^\.\\//r}++
        }
    }
    close($fh);
}

use Text::Table::Tiny 'generate_table';
sub print_search_result {
    #Show search result
    my $rows;
    push @{$rows} , ["Words","File(s)"];
    for my $w (sort @words) {
        my $sub;
        for my $f (sort keys %{$index{$w}}) {
            $sub .= "$f ";
        }
        push @{$rows} , [$w,$sub||"(N/A)"];
    }
    print generate_table(rows => $rows, header_row => 1);
}

sub show_index {
    #Show File Index
    my $rows;
    push @{$rows} , ["Words","File(s)"];
    for my $w (sort keys %index) {
        my $sub;
        for my $f (sort keys %{$index{$w}}) {
            $sub .= "$f ";
        }
        push @{$rows} , [$w,$sub];
    }
    print generate_table(rows => $rows, header_row => 1);
}

&print_search_result;

I used the Text::Table::Tiny module to print out a nice table. Output should look something like this:
perl .\ch-2.pl "i sing eat and love" .\file1.txt .\file2.txt .\file3.txt .\file4.txt .\file5.txt
+-------+--------------------------------+
| Words | File(s)                        |
+-------+--------------------------------+
| and   | file1.txt file2.txt            |
| eat   | file4.txt                      |
| i     | file1.txt file2.txt file4.txt  |
| love  | file2.txt file5.txt            |
| sing  | (N/A)                          |
+-------+--------------------------------+

Leave a comment

About Yet Ebreo

user-pic I blog about Perl.