Perl Weekly Challenge W024 - Smallest Script, Inverted Index

By Yet Ebreo on September 4, 2019 3:29 PM under PWC

I've been doing the Perl Weekly Challenge (PWC) for 3 weeks now. So far there's been unique challenges that made me utilize different modules. I even submitted a solution using APIs which I haven't done in my work because I didn't have any reason to. (lol)

If you'd like to join the fun and contribute, please visit the site link managed by Mohammad S Anwar.

Task #1 - Smallest Script:
The tasks for this week's challenge (#24) are a bit confusing at first but I just did what was asked. The first task was to create the smallest script as described below:

Create a smallest script in terms of size that on execution doesn’t throw any error. The script doesn’t have to do anything special. You could even come up with smallest one-liner.

There is no problem to solve, so in my entry I just put a $%:

perl -e '$%'

The code did not throw any error even when I tried an empty script. I hope next week would be a golfing challenge at least. And that was Task #1, moving on to

Task #2 - Inverted Index:
I honestly haven't heard of the term. Good thing the task includes the Wiki description

Create a script to implement full text search functionality using Inverted Index. According to wikipedia:

In computer science, an inverted index (also referred to as a postings file or inverted file) is a database index storing a mapping from content, such as words or numbers, to its locations in a table, or in a document or a set of documents (named in contrast to a forward index, which maps from documents to content). The purpose of an inverted index is to allow fast full-text searches, at a cost of increased processing when a document is added to the database.

In my solution, I used hash to associate word with the files it was found. It follows the structure:

%index{$word}{$file}

The hash would be later on used to print out the location/file(s) where the search keys can be found.

Solution:

use strict;

use warnings;

use 5.010;

die "Usage:\n\tch-2.pl 
  \n\nExample:\n\tch-2.pl \"i sing eat and love\" file1.txt file2.txt\n\n" if @ARGV < 2;

#Set the minimum length of words to be included

my $minimum_length = 1;

# Get the words to search. Convert it to lowercase. 

# Use as keys in a hash to get only unique words

my %hash_words = map { lc $_ => 1 } shift=~/([^, ]+) *,?/g;

# Retrieve the keys and store in @words

my @words = keys %hash_words;

# Create index from files

my @files = @ARGV;

my %index;

for my $file (@files) {

    open(my $fh, "<", $file);

    while(<$fh>) {

        for my $w (grep { y///c >=$minimum_length } /(\w+) ?/g) {

            $index{lc $w}{lc $file=~s/^\.\\//r}++

        }

    }

    close($fh);

}

use Text::Table::Tiny 'generate_table';

sub print_search_result {

    #Show search result

    my $rows;

    push @{$rows} , ["Words","File(s)"];

    for my $w (sort @words) {

        my $sub;

        for my $f (sort keys %{$index{$w}}) {

            $sub .= "$f ";

        }

        push @{$rows} , [$w,$sub||"(N/A)"];

    }

    print generate_table(rows => $rows, header_row => 1);

}

sub show_index {

    #Show File Index

    my $rows;

    push @{$rows} , ["Words","File(s)"];

    for my $w (sort keys %index) {

        my $sub;

        for my $f (sort keys %{$index{$w}}) {

            $sub .= "$f ";

        }

        push @{$rows} , [$w,$sub];

    }

    print generate_table(rows => $rows, header_row => 1);

}

&print_search_result;

I used the Text::Table::Tiny module to print out a nice table. Output should look something like this:

perl .\ch-2.pl "i sing eat and love" .\file1.txt .\file2.txt .\file3.txt .\file4.txt .\file5.txt

+-------+--------------------------------+

| Words | File(s)                        |

+-------+--------------------------------+

| and   | file1.txt file2.txt            |

| eat   | file4.txt                      |

| i     | file1.txt file2.txt file4.txt  |

| love  | file2.txt file5.txt            |

| sing  | (N/A)                          |

+-------+--------------------------------+

0 comments

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About Yet Ebreo

I blog about Perl.

More info »

Yet Ebreo

Perl Weekly Challenge W024 - Smallest Script, Inverted Index

Leave a comment

About Yet Ebreo

Search this blog