How to set up your own PerlMongers web service in 10 minutes or less
I've been working with ElasticSearch over the past months as part of the MetaCPAN project. Using ElasticSearch as our back end has worked out really well so far. The reason is that, out of the box, it provides a REST API. So, in our case, we've been able to concentrate on writing code and not on designing an API, defining its behaviour, arguing over URL schemes etc.
To be clear, ES is not written in Perl, but there is a handy Perl module you can use to get yourself up and running in *minutes*.
As my example of how to run our own web service, I've chosen to create a service which hosts info on all of the PerlMongers groups in the comprehensive XML file found here. We're going to grab it, parse it and stuff it into ElasticSearch. Once you have the logic of that part down (and there's not a lot to it), you're basically done.
So, to start, just grab yourself a copy of ElasticSearch. Unzip it and run the following command:
bin/elasticsearch -f
You know have an ES server running in the foreground on port 9200. That's it!
Now, for the script:
#!/usr/bin/env perl
use Modern::Perl;
use ElasticSearch;
use Encode;
use Try::Tiny;
use WWW::Mechanize::Cached;
use XML::Simple;
my $index_name = 'perlmongers';
my $es = ElasticSearch->new(
servers => 'localhost:9200',
transport => 'httplite',
timeout => 30,
);
# if we're running this script the 1st time, there's no index to drop
try { $es->delete_index( index => $index_name ); };
# index doesn't exist until we create it explicitly
$es->create_index( index => $index_name, );
# cache the XML because we may be running the script a lot in dev
my $mech = WWW::Mechanize::Cached->new;
$mech->get( 'http://www.pm.org/groups/perl_mongers.xml' );
# we don't want empty elements represented as {}
my $xml = XMLin( $mech->content, SuppressEmpty => undef );
foreach my $pm_name ( sort keys %{$xml->{group}} ) {
my $group = $xml->{group}->{$pm_name};
my %to_insert = (
name => $pm_name,
tsar_name => $group->{tsar}->{name} || undef,
web => $group->{web} || undef,
);
foreach my $geo ( keys %{ $group->{location} } ) {
$to_insert{$geo} = $group->{location}->{$geo};
}
# fix any encoding problems
foreach my $key ( keys %to_insert ) {
if ( $to_insert{$key} ) {
$to_insert{$key} = encode_utf8( $to_insert{$key} );
}
}
my %update = (
index => $index_name,
type => 'group',
id => $group->{id},
data => \%to_insert,
);
# you should probably check return values
my $result = $es->index( %update );
}
There's nothing fancy going on over here. First, I've created an index called perlmongers. Then I've just grabbed what I thought were the interesting parts of the XML data and stuffed them into a hash. I've used this hash to populate a type called group. The id I've used for these groups is the internal id already listed in the XML file. I've made the (hopefully safe) assumption that these ids will not change in future.
To see what is now in the index, run a query on your local ES server via your web browser:
http://localhost:9200/perlmongers/group/_search?q=*&size=25
This will return the first 25 entries. You'll see that they look something like this:
{ "_index": "perlmongers", "_type": "group", "_id": "113", "_score": 1, "_source": { "country": "Denmark", "longitude": "10.216", "region": null, "name": "Aarhus.pm", "state": null, "tsar_name": "Lars Balker Rasmussen", "web": "http://aarhus.pm.org/", "city": "Aarhus", "continent": "Europe", "latitude": "56.15" } },
The ES server returns your data in JSON. Nice! Now, if you want to see just one result, you can search on a name:
http://localhost:9200/perlmongers/group/_search?q=name:toronto.pm
You can also use an id if you happen to know it:
http://localhost:9200/perlmongers/group/103
Maybe you want to search on state:
http://localhost:9200/perlmongers/group/_search?q=state:new york
Anyway, I think you can see where I'm going with this. How much work did you really have to do here? You had to munge a bit of data, but that's basically it. Along with that, you get all the goodness of a rich API which supports some basic queries (which I've shown you here) along with much more complex queries which you can dig into in the ES docs.
If you want to see ES in action, check out some of the sample URLs which are posted on the MetaCPAN wiki. If you have any questions, feel free to join us on #metacpan at irc.freenode.net. If you feel like contributing some code, please hit us up on IRC as well.
Hi Oalders
nice post!
One question: why do you encode to UTF8? That should be handled by ElasticSearch.pm. Did you see any issues with encoding? I tried deleting that block and rerunning the script, and the encoding seems fine.
ta
Clint
Hi Clinton,
I've put up a gist with the error: https://gist.github.com/822286. It went away after I handled the encoding, but from the error message, I really couldn't tell exactly what the issue was.
Thanks,
Olaf
Hmmm,that's weird. Especially as that particular entry has only ASCII characters.
What's the error you get in the elasticsearch log?
I wonder if this is a version thing - what version of Perl , XML::Simple, WWW::Mechanize and WWW::Mechanize::Cached are you using?
clint
Hi Clint,
I've updated the Gist with the server errors. You are correct about the modules. When I switch from Mechanize::Cached to Mechanize, I no longer need to encode. Weird.
Olaf