Perl Weekly Challenge # 24: Smallest Script and Inverted Index

By laurent_r on September 3, 2019 8:33 AM

These are some answers to the Week 24 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in several days from now (September 8 , 2019). This blog post offers some solutions to this challenge, please don't read on if you intend to complete the challenge on your own.

Challenge # 1: Smallest Script With No Execution Error

Create a smallest script in terms of size that on execution doesn’t throw any error. The script doesn’t have to do anything special. You could even come up with the smallest one-liner.

I was first puzzled by this strange specification. Can it be that we really want a script that does nothing? Does it have to be the shortest possible script.

Well, after reading again, yes, it seems so.

I'll go for one-liners.

My script in Perl 5:

$ perl -e ''

Just in case there is any doubt, we can check the return value under Bash to confirm that there was no error:

$ echo $?
0

And this is my script in Perl 6:

$ perl6 -e ''

Note that, in both Perl 5 and Perl 6, creating an empty file and using it as a parameter to the perl or perl6 command line would work just as well, for example:

$ perl6 my-empty-file.pl

And that's it for the first challenge. Boy, that was a quick one.

Inverted Index

Create a script to implement full text search functionality using Inverted Index. According to wikipedia:

In computer science, an inverted index (also referred to as a postings file or inverted file) is a database index storing a mapping from content, such as words or numbers, to its locations in a table, or in a document or a set of documents (named in contrast to a forward index, which maps from documents to content). The purpose of an inverted index is to allow fast full-text searches, at a cost of increased processing when a document is added to the database.

Inverted Index in Perl 5

I do not find the Wikipedia explanation to be very clear, but I'll implement the following: I have on my file system a directory containing about 500 Perl scripts (with a '.pl' extension). My program will read all these files (line by line), split the lines into words and keep only words containing only alphanumerical characters (to get rid of operators and variables names with sigils) and with a length of at least 3 such characters. These words will be used to populate a hash (actually a HoH), so that for each such word, I'll be able to directly look up the name of all the files where this word is used.

This is fairly simple:

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

my @files = glob "./*.pl";
my %dict;
for my $file (@files) {
    open my $IN, "<", $file or die "Cannot open $file $!";
    while (my $line = <$IN>) {
        my @words = grep { /^\w{3,}$/ } split /\s+/, $line;;
        $dict{$_}{$file} = 1 for @words;
    }
    close $IN;
}
print Dumper \%dict;

The output has a bit less than 20,000 lines, which read in part as follows:

'checkdir' => {
                './monitor_files.pl' => 1,
                './monitor_files2.pl' => 1
              },
'start' => {
             './solver.pl' => 1,
             './url_regex.pl' => 1,
             './teams.pl' => 1,
             './test_start.pl' => 1,
             './markov_analysis.pl' => 1
           },
'1000' => {
            './first.pl' => 1,
            './jam1.pl' => 1
          },
'Minimal' => {
               './vigenere.pl' => 1
             },
'last' => {
            './strong_primes.pl' => 1,
            './pm_1196078.pl' => 1,
            './bench_lazy_map.pl' => 1,
            './inter_pairs.pl' => 1,
            './ladder2.pl' => 1,
            './perfect.pl' => 1,
            './homophones.pl' => 1,
            './pairs.pl' => 1,
            (...)

It wouldn't be difficult to store the output into a text file (that can then be reloaded into a Perl script hash) or into a database, or to find some other way of making the data persistent, but I have little use for such an index and the challenge specification does not request anything of that type. So, I will not try to go further.

Inverted Index in Perl 6

We'll do the same thing in Perl 6, but with another directory containing about 350 Perl 6 programs (with ".p6" or ".pl6" extensions).

use v6;

my @files = grep { /\.p6$/ or /\.pl6$/ }, dir('.');
my %dict;
for @files -> $file {
    for $file.IO.lines.words.grep({/^ \w ** 3..* $/}) -> $word {
        %dict{$word}{$file} = True;
    }
}
.say for %dict{'given'}.keys;

The program duly prints out the list of files with the given keyword:

$ perl6 inverted-index.p6
mult_gram.p6
calc_grammar.pl6
calculator-exp.pl6
VMS_grammar.p6
ana2.p6
calc_grammar2.pl6
ArithmAction.pl6

[... lines omitted for brevity]

normalize_url.p6
calculator.p6
arithmetic.pl6
json_grammar_2.pl6
point2d.pl6
arithmetic2.pl6
forest.p6

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, September 15. And, please, also spread the word about the Perl Weekly Challenge if you can.

0 comments

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About laurent_r

I am the author of the "Think Perl 6" book (O'Reilly, 2017) and I blog about the Perl 5 and Raku programming languages.

More info »

laurent_r