ElasticSearch.pm gets big performance boost
ElasticSearch version 0.12 is out today along with some nice new features.
However, the thing I'm most excited about is that ElasticSearch.pm v 0.26 is also out and has support for bulk indexing and pluggable backends, both of which add a significant performance boost.
Pluggable backends
I've factored out the parts which actually talk to the ElasticSearch server into the ElasticSearch::Transport module, which acts as a base class for ElasticSearch::Transport::HTTP (which uses LWP), ::HTTPLite (which uses, not surprisingly, HTTP::Lite) and ::Thrift, which uses the Thrift protocol
I expected Thrift to be the big winner, but it turns out that the generated code is dog-slow. However, HTTP::Lite is about 20% faster than LWP:
httplite : 63 seconds, 951 tps
http : 79 seconds, 759 tps
thrift : 690 seconds, 87 tps
Bulk indexing
Since version 0.11, ElasticSearch has had a bulk
operation, which can take a stream of index
, create
and delete
statements in a single request.
For instance, you could do:
{ index => {
index => 'foo', type=>'bar', id=>1, data => { foo => 'bar' }
{ create => {
index => 'foo', type=>'bar', id=>2, data => { foo => 'bar' }
{ delete => {
index => 'foo', type=>'bar', id=>1
The number of actions you can pass in depends on how much memory you have, both on the client and the server, and how big your documents are.
I tried tranches of 1,000, 5,000 and 10,000 documents at a time, the results were very similar.
All tranches and all transports averaged about 7.5 seconds or 8,000 transactions per second! These are small documents, so I would be surprised to achieve this rate in the real world, but a 10x improvement is phenomenal.
(These benchmarks were run on my laptop with a single ElasticSearch node, over 59,950 documents ({ text => $string}
) whose string value averaged 310 characters in length and consisted of real world text, not randomly generated gibberish. )
Example script
(This is now included in the examples directory of ElasticSearch.pm)
Finally, here is a simple example script which downloads from github all of the issues open against ElasticSearch, indexes them, and provides a simple command line interface to searching for them:
use strict;
use warnings;
use JSON::XS();
use ElasticSearch();
use ElasticSearch::Util qw(filter_keywords);
use HTTP::Lite();
use v5.12.0;
my $url = 'http://github.com/api'
. '/v2/json/issues/list/elasticsearch/elasticsearch/open';
my $json = JSON::XS->new->utf8(1)->pretty(1);
my $es = ElasticSearch->new( servers => '' );
# Download issues list from github
my $http = HTTP::Lite->new();
my $req = $http->request($url);
die "couldn't retrieve issues list" unless $req && $req == 200;
my $issues = $json->decode( $http->body )->{issues};
# delete index in case it already exists, then create the index
eval { $es->delete_index( index => 'issues' ) };
$es->create_index( index => 'issues' );
# prepare issues for indexing
my $id = 1;
my @docs;
for (@$issues) {
# each doc needs an index, a type, an ID and data
my $doc
= { index => 'issues', type => 'entry', id => $id++, data => $_ };
# we want to 'create' each doc (as opposed to 'index' or 'delete')
push @docs, { create => $doc };
# bulk index docs
my $res = $es->bulk( \@docs );
if ( $res->{errors} ) {
die "Bulk index had issues: " . $json->encode( $res->{errors} );
# force all changes to be refreshed immediately
say "Total issues indexed: " . $es->count( match_all => {} )->{count};
# search for issues
while (1) {
print "\nEnter keywords to search for, or an issue ID:\n > ";
my $keywords = <>;
chomp $keywords;
last unless $keywords;
# if an issue ID, retrieve the doc and display it
if ( $keywords =~ /^\d+$/ ) {
my $doc = $es->get(
index => 'issues',
type => 'entry',
id => $keywords
for my $key ( sort keys %$doc ) {
my $val = $doc->{$key} // '';
say "$key: $val";
say '-' x 60;
# otherwise, we're searching for keywords, so filter
# them to make sure the keywords don't include special chars
$keywords = filter_keywords($keywords);
my $result
= $es->search( query => { field => { _all => $keywords } } );
say "Total results found: " . $result->{hits}{total};
printf( " - %02d: %s\n", $_->{_id}, $_->{_source}{title} )
for @{ $result->{hits}{hits} };
That example script is such a clever demo, keep up the good work!