ElasticSearch.pm v0.36, now with extra sugar

ElasticSearch v 0.16.0 was released yesterday with a long list of new features, enhancements and bug fixes.

ElasticSearch.pm v 0.36 is on its way to CPAN as we speak.

Besides adding support for the new stuff in v 0.16, I've also added a few features:

scrolled_search()

It is possible to scroll through a long list of results in ElasticSearch, but this required a bit of repetitive code, which is now nicely packaged up in scrolled_search. So you can do:



    $scroll = $es->scrolled_search( 
        search_type => 'scan',   # efficient search type for scrolling
        scroll => '2m', # cache search results for the next 2 minutes
    );

    while (my $doc = $scroll->next(1)) {
         # do something
    }


reindex()

Users on the mailing list are always asking how to reindex their data, either from one index to another on the same cluster, or from one cluster to another.

Now, scrolled_search() and reindex() make it easy to do this in a single command.

For example:

To copy the ElasticSearch website index locally, you could do:



    my $local = ElasticSearch->new(
        servers => 'localhost:9200'
    );
    my $remote = ElasticSearch->new(
        servers    => 'search.elasticsearch.org:80',
        no_refresh => 1
    );

    my $source = $remote->scrolled_search(
        search_type => 'scan',
        scroll      => '5m'
    );
    $local->reindex(source=>$source);

To copy one local index to another, make the title upper case,
exclude docs of type boring, and to preserve the version numbers
from the original index:



    my $source = $es->scrolled_search(
        index       => 'old_index',
        search_type => 'scan',
        scroll      => '5m',
        version     => 1
    );

    $es->reindex(
        source      => $source,
        dest_index  => 'new_index',
        transform   => sub {
            my $doc = shift;
            return if $doc->{_type} eq 'boring';
            $doc->{_source}{title} = uc( $doc->{_source}{title} );
            return $doc;
        }
    );

no_refresh

By default, ElasticSearch.pm retrieves a list of live nodes from the ElasticSearch cluster, and round-robins around them.

However, if you are talking to a remote ES cluster, or a cluster behind a proxy, this may not be desirable behaviour. The no_refresh parameter turns off the discovery of live nodes. Instead ES.pm round robins through the list of servers passed to new(), and can fail over between this list:



    my $es = ElasticSearch->new(
        servers => ['es1.search.com:80', 'es2.search.com:80'],
        no_refresh => 1
    );


1 Comment

Leave a comment

About Clinton Gormley

user-pic The doctor will see you now...