Perl Weekly Challenge 024: Inverted Index and Shortest Oneliner
I’ll start with the second task, as the first one is somehow different (see below).
Inverted Index
Create a script to implement full text search functionality using Inverted Index.
An inverted index is an index storing a mapping from content to its location. I chose to store the filename and line number for all words in a given list of files.
I decided to use a Perl structure instead of a database to store the index, and to use Storable to make it persistent. A hash of hashes of arrays seemed the most natural to me, storing for the line numbers in the inner array for each file name for each word. I started writing the main program:
my $action = shift;
my %dispatch = (
help => \&help,
create => \&create,
search => \&search,
);
my $run = $dispatch{$action} // \&unknown;
$run->(@ARGV);
The defined-or operator //
is probably familiar to everyone nowadays, but remembering my struggles with Perl 5.8.3 at my $job - 1
four years ago, I still prefer to add
use Syntax::Construct qw{ // };
whenever I use it. If you’re on 5.10+, nothing will happen, but people on older Perl versions will get a meaningful error message telling them they need 5.10 at least to run the program, instead of the standard Search pattern not terminated at ...
Some of the subroutines are obvious:
sub help {
say STDERR "Usage: $0 help";
say STDERR " $0 create index doc1 doc2...";
say STDERR " $0 search index term";
}
sub unknown {
help();
die "Unknown action\n";
}
To implement a persistent index, we need to add
use Storable qw{ store retrieve };
Creating the index is simple. Loop over the files, populate the structure, and save it when done. We can take advantage of the $.
variable to get the current line number. Note that a “word” corresponds to Perl’s notion, i.e. \w+
.
sub create {
my ($index_file, @documents) = @_;
my %index;
for my $document (@documents) {
warn $document;
open my $in, '<', $document or die "$document: $!";
while (<$in>) {
push @{ $index{$1}{$document} }, $. while /(\w+)/g;
}
}
store(\%index, $index_file);
}
Searching is even simpler: populate the structure and output the filenames and line numbers corresponding to the given term.
sub search {
my ($index_file, $term) = @_;
my %index = %{ retrieve($index_file) };
for my $document (keys %{ $index{$term} }) {
say "$document: ", join ' ', @{ $index{$term}{$document} };
}
}
Shortest Oneliner
Create a smallest script in terms of size that on execution doesn’t throw any error. The script doesn’t have to do anything special. You could even come up with smallest one-liner.
The shortest one-liner that doesn’t throw any error is the empty one. Perl is OK with a program that does nothing and ends just when it’s just started. So, the solution is
perl -e ''
But I tried to read between the lines. Maybe the Team is running out of ideas and this is a way to ask for help?
Currently, I don’t have enough time to prepare a task myself (yes, I know how time consuming it could be, I was a university teacher), but here are some links where you can find inspiration:
Leave a comment