Perl Weekly Challenge 024: Inverted Index and Shortest Oneliner

By E. Choroba on September 8, 2019 11:41 PM under perl-weekly-challenge

I’ll start with the second task, as the first one is somehow different (see below).

Inverted Index

Create a script to implement full text search functionality using Inverted Index.

An inverted index is an index storing a mapping from content to its location. I chose to store the filename and line number for all words in a given list of files.

I decided to use a Perl structure instead of a database to store the index, and to use Storable to make it persistent. A hash of hashes of arrays seemed the most natural to me, storing for the line numbers in the inner array for each file name for each word. I started writing the main program:

my $action = shift;

my %dispatch = (
    help   => \&help,
    create => \&create,
    search => \&search,
);

my $run = $dispatch{$action} // \&unknown;

$run->(@ARGV);

The defined-or operator // is probably familiar to everyone nowadays, but remembering my struggles with Perl 5.8.3 at my $job - 1 four years ago, I still prefer to add

use Syntax::Construct qw{ // };

whenever I use it. If you’re on 5.10+, nothing will happen, but people on older Perl versions will get a meaningful error message telling them they need 5.10 at least to run the program, instead of the standard Search pattern not terminated at ...

Some of the subroutines are obvious:

sub help {
    say STDERR "Usage: $0 help";
    say STDERR "       $0 create index doc1 doc2...";
    say STDERR "       $0 search index term";
}

sub unknown {
    help();
    die "Unknown action\n";
}

To implement a persistent index, we need to add

use Storable qw{ store retrieve };

Creating the index is simple. Loop over the files, populate the structure, and save it when done. We can take advantage of the $. variable to get the current line number. Note that a “word” corresponds to Perl’s notion, i.e. \w+.

sub create {
    my ($index_file, @documents) = @_;
    my %index;
    for my $document (@documents) {
        warn $document;
        open my $in, '<', $document or die  "$document: $!";
        while (<$in>) {
            push @{ $index{$1}{$document} }, $. while /(\w+)/g;
        }
    }
    store(\%index, $index_file);
}

Searching is even simpler: populate the structure and output the filenames and line numbers corresponding to the given term.

sub search {
    my ($index_file, $term) = @_;
    my %index = %{ retrieve($index_file) };
    for my $document (keys %{ $index{$term} }) {
        say "$document: ", join ' ', @{ $index{$term}{$document} };
    }
}

Shortest Oneliner

Create a smallest script in terms of size that on execution doesn’t throw any error. The script doesn’t have to do anything special. You could even come up with smallest one-liner.

The shortest one-liner that doesn’t throw any error is the empty one. Perl is OK with a program that does nothing and ends just when it’s just started. So, the solution is

perl -e ''

But I tried to read between the lines. Maybe the Team is running out of ideas and this is a way to ask for help?

Currently, I don’t have enough time to prepare a task myself (yes, I know how time consuming it could be, I was a university teacher), but here are some links where you can find inspiration:

0 comments

Tagged as:

challenge, competition, index, perl weekly challenge

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About E. Choroba

I blog about Perl.

More info »

E. Choroba