February 2010 Archives

Alien::ElasticSearch 0.05 and ElasticSearch.pm 0.03

Just released Alien::ElasticSearch (version 0.05 on its way to a CPAN near you).

This downloads, builds and installs the latest version of ElasticSearch from GitHub, which makes a live server available for automated testing of....

ElasticSearch.pm v 0.03, also on its way to a CPAN near you.

This version is completely rewritten to to make it easier to extend later (more like one big dispatch table), and has improved debugging, usage messages, errors etc.

Try: (with a server running on localhost)

   use ElasticSearch;
   my $e = ElasticSearch->new( 
         servers  => '127.0.0.1:9200',
         trace_calls => 1,
   );

   $e->nodes;

... prints to STDERR:

curl -XGET 'http://127.0.0.2:9200/_cluster/nodes' 
# {
#    "clusterName" : "elasticsearch",
#    "nodes" : {
#       "getafix-25528" : {
#          "httpAddress" : "inet[/127.0.0.2:9200]",
#          "dataNode" : true,
#          "transportAddress" : "inet[getafix.traveljury.com/127.0.
# >          0.2:9300]",
#          "name" : "Miguel Espinosa"
#       }
#    }
# }

ie, the curl command which allows you rerun your requests directly from the command line, and the ElasticServer response, commented out.

And a test suite, which has helped to find a number of issues (now fixed) in ElasticSearch itself.

I'm thinking of adding an ElasticSearch::QueryBuilder to make it easier to generate the right query structure that Lucene and derivatives expect. (See here for an example of just how convoluted the query structure can be - note, that was my first attempt at a query, not sure if it is correct or not)

How should I write a test suite which depends on an external server?

I'm in the process of writing a test suite for ElasticSearch.pm, but in order for it to run any tests, it requires access to an ElasticSearch cluster.

Currently, I just skip all tests unless $ENV{ES_SERVER} is set, but this requires manual installation / testing.

Alternatively, I could (if $ENV{ES_SERVER} isn't set) try to download and compile a test version, which requires git and java v 1.6 or higher. It doesn't take long to compile, but long enough so that user may not want to do it by default.

So I could ask them if they want the script to build a test server, but again, this requires manual installation.

It'd be nice to use the cpan testers to run the test suite on multiple platforms, which implies building a test cluster by default.

What would you do?

Perl API for ElasticSearch

I was about to start implementing the Sphinx full text search engine on our site when I saw that a new open source search engine ElasticSearch has just been released.

The overview shows off some of its many features but in summary, it:

  • is easy to setup
  • is designed to be distributed, and to scale from one node to hundreds
  • is real time
  • has a free search schema
  • is based on Lucene
  • speaks JSON over HTTP
  • supports multitenancy, which includes multiple indices, and multiple types per index, with the ability to query across any combination of the two

I liked the look of it so much that I've written a simple Perl API, which should be available on CPAN at : http://search.cpan.org/~drtech/ElasticSearch-0.01/

One nice thing that ElasticSearch.pm does is to retrieve a list of all available nodes in the ElasticSearch cluster, and tries to spread the load across nodes automatically.

Also, if the current node disappears, then it tries to connect to the other nodes that it knows about. Only if no other nodes are available does it fail.

ElasticSearch.pm is an alpha release (doesn't even have a test suite yet), and feedback is more than welcome.

Getting a server running is dead simple. (You need at least Java 1.6). On *nix:

cd ~
git clone git://github.com/elasticsearch/elasticsearch.git
cd elasticsearch
./gradlew clean devRelease

cd /path/where/you/want/elasticsearch
unzip ~/elasticsearch/distributions/elasticsearch*

To start a test server in the foreground, running on 127.0.0.1:9200:

./bin/elasticsearch -f

You can start multiple servers by repeating this command - they will autodiscover each other.

Then in Perl, you can test it out with:

use ElasticSearch;
use Data::Dump qw(pp);   ## just using pp to dump the return values

my $e = ElasticSearch->new( servers => '127.0.0.1:9200', debug => 1 );

# index a "document"
pp $e->index(
    index => 'twitter',
    type  => 'tweet',
    id    => 1,
    data  => {
        user        => 'kimchy',
        postDate    => '2009-11-15T14:12:12',
        message     => 'trying out Elastic Search'
    }
);

# retrieve it by ID
pp $e->get(
    index => 'twitter',
    type  => 'tweet',
    id    => 1
);

# search for it by query term
pp $results = $e->search(
    index => 'twitter',
    type  => 'tweet',
    query => {
        term    => { user => 'kimchy' },
    }
);

The example above shows how easy it is to get started, but don't be fooled into thinking that ElasticSearch is a toy - while it hides a lot of complexity, it provides the functionality to tune your indexing and searches to the 'nth degree.

Git repo at http://github.com/clintongormley/ElasticSearch.pm

How do you handle Amazon EC2's failures

We've just moved our website to Amazon EC2, and within about 2 hours of going live, or proxy server went down. Just disappeared. We couldn't even terminate the instance.

OK, temporary glitch. It happens.

2 weeks go by, then last night, our alarms go crazy. All 3 database servers have gone down. They're there, just not responding to ssh or even ping.

We try to reboot the instances. From the console log, I can see that they reboot. Still not accessible. I launch another instance of our DB AMI. It boots, but is also unresponsive.

Eventually we boot a vanilla…

Forcing IE to accept script tags in innerHTML

So, my first blog post, and instead of Perl, I'm writing about Javascript.

I'm using a common idiom:

  • AJAX call returns HTML with embedded script tags
  • create a temporary <div>
  • div.innerHTML = request.responseText
  • move the children of the div to the appropriate spot
Firefox conveniently executes the script texts. Opera and Safari require extra steps to execute the script contents (eg, globalEval in jquery), and IE does whatever the hell it pleases.

IE usually works with globalEval, except when it doesn't.  I found that if the AJAX response was just a single script tag, then IE would filter it out.  But script tags were being created in certain circumstances.

Long story short, if you need to return a single <script> tag, wrap it in a <form> tag.  For whatever reason, IE will then accept it as innerHTML and create the script node, which you can then execute with globalEval or similar

About Clinton Gormley

user-pic The doctor will see you now...