<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Clinton Gormley</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/clinton_gormley/" />
    <link rel="self" type="application/atom+xml" href="http://blogs.perl.org/users/clinton_gormley/atom.xml" />
    <id>tag:blogs.perl.org,2009-11-03:/users/clinton_gormley//239</id>
    <updated>2012-02-21T16:25:18Z</updated>
    <subtitle>Perl and stuff</subtitle>
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type Pro 4.38</generator>

<entry>
    <title>RFC: Single or multiple instances of ORM objects?</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/clinton_gormley/2012/02/rfc-single-or-multiple-instances-of-orm-objects.html" />
    <id>tag:blogs.perl.org,2012:/users/clinton_gormley//239.2852</id>

    <published>2012-02-21T15:25:17Z</published>
    <updated>2012-02-21T16:25:18Z</updated>

    <summary>In our homegrown ORM we have an in-memory cache, which enables us to ensure that only one instance of any object is live in memory at any one time. In other words: $one = MyObject-&gt;get(123); $two = MyObject-&gt;get(123); refaddr($one) ==...</summary>
    <author>
        <name>Clinton Gormley</name>
        
    </author>
    
        <category term="ElasticSearch" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="elasticsearch" label="elasticsearch" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="perl" label="perl" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/clinton_gormley/">
        <![CDATA[<p>In our homegrown ORM we have an in-memory cache, which enables us
to ensure that only one instance of any object is live in memory 
at any one time.  </p>

<p>In other words:</p>

<pre style="color: black; padding: 10px; line-height: 1.4; overflow-x: auto; border-radius: 6px 6px 6px 6px; background: rgb(248, 248, 248); border: 1px dotted rgb(204, 204, 204); font-size: 92%;"><code>
    $one = MyObject->get(123);
    $two = MyObject->get(123);

    refaddr($one) == refaddr($two)
</code></pre>

<p>I find this setup useful because:</p>

<ul>
<li>if you update one copy of the object, all other copies    automatically update</li>
<li>get&#8217;ing the object again is cheap</li>
</ul>

<p>When I do a search against the DB, it returns a list of objects,
which I can then retrieve (in bulk) from:</p>

<pre><code>-&gt; the in memory cache
  -&gt; memcached
    -&gt; the DB
</code></pre>

<p>No DB-based object contains another DB-based object, to avoid circular
references.  Instead, it just contains the ID of the object. 
Retrieving the actual object is cheap (assuming it has already
been loaded) because we can just request the single instance of 
that object from the in-memory cache.</p>

<p>The in-memory cache is cleared at the end of each web-request.</p>

<p>The above is pretty similar to how KiokuDB works.</p>

<h2>THE FUTURE AND BEYOND:</h2>

<p>I&#8217;m currently working on an &#8220;ORM&#8221; that uses ElasticSearch as its
backend. (&#8220;ORM&#8221; is in quotes because ES functions as a 
Lucene-powered document store, rather than being a relational DB).</p>

<p>I&#8217;d like to replicate the current functionality, because I think it 
has merits, but there is a complication: </p>

<blockquote>
  <p><strong>Time doesn&#8217;t necessarily flow forwards</strong></p>
</blockquote>

<p>To explain: </p>

<ul>
<li>ES has real-time GET. In other words, as soon as a
document has been indexed (saved), it is available to be retrieved
by it&#8217;s unique ID</li>
<li>When searching for documents, the full document is returned (by
default), which means that you don&#8217;t have to do a second request
to GET the document, but:</li>
<li>ES has NEAR-real-time SEARCH.  Once a second (by default), the
search view is refreshed to include changes that have occurred 
during the last second</li>
</ul>

<p>What this means is that I could:</p>

<pre style="color: black; padding: 10px; line-height: 1.4; overflow-x: auto; border-radius: 6px 6px 6px 6px; background: rgb(248, 248, 248); border: 1px dotted rgb(204, 204, 204); font-size: 92%;"><code>
    GET doc 123        -> returns version 6
    SEARCH for doc 123 -> returns version 5
</code></pre>

<p>This would normally never happen in a traditional DB, because updates
are atomic, and indexes are updated as the document is indexed. But it 
could happen in a master-slave setup where there is replication lag.</p>

<p>Also, I&#8217;m guessing this is a common scenario in NoSQL datastores.</p>

<h2>Note: </h2>

<p>This is an issue just for the current request, not
for writes to ES.  Every doc in ES has a _version number, and if 
you try to update the wrong version, it will throw a Conflict error, 
in which case you can: </p>

<ul>
<li>get the latest version, reapply your changes and save, or</li>
<li>instruct ES to ignore the version and to update the doc regardless</li>
</ul>

<p>So where might this be a problem:</p>

<h2>Scenarios:</h2>

<pre style="color: black; padding: 10px; line-height: 1.4; overflow-x: auto; border-radius: 6px 6px 6px 6px; background: rgb(248, 248, 248); border: 1px dotted rgb(204, 204, 204); font-size: 92%;"><code>
    $a = get     -> version 1
    $b = search  -> version 1
</code></pre>

<p>This one is easy. $b can just reuse the object in $a.</p>

<pre style="color: black; padding: 10px; line-height: 1.4; overflow-x: auto; border-radius: 6px 6px 6px 6px; background: rgb(248, 248, 248); border: 1px dotted rgb(204, 204, 204); font-size: 92%;"><code>
    $a = get     -> version 1
    $b = search  -> version 1
    $a->change()
    $a->save()   -> version 2
</code></pre>

<p>Potentially, the object no longer matches the search that you did,
so you may be displaying incorrect results. (eg you search
for name == &#8216;Joe&#8217;, then change name to &#8216;Bob&#8217;).  But this looks
like a reasonable process to me.</p>

<pre style="color: black; padding: 10px; line-height: 1.4; overflow-x: auto; border-radius: 6px 6px 6px 6px; background: rgb(248, 248, 248); border: 1px dotted rgb(204, 204, 204); font-size: 92%;"><code>
    $a = get     -> version 2
    $b = search  -> version 1
</code></pre>

<p>Our search has returned an older version of the object. The newer
version might or not match the search parameters.  Do we display
the old results? or the new results?</p>

<pre style="color: black; padding: 10px; line-height: 1.4; overflow-x: auto; border-radius: 6px 6px 6px 6px; background: rgb(248, 248, 248); border: 1px dotted rgb(204, 204, 204); font-size: 92%;"><code>
    $a = get     -> version 1
    $a->change()
    $b = search  -> version 1
</code></pre>

<p>We have a changed (but as yet unsaved) object in the cache. Should
$b contain the changed object, or the pristine object?</p>

<pre style="color: black; padding: 10px; line-height: 1.4; overflow-x: auto; border-radius: 6px 6px 6px 6px; background: rgb(248, 248, 248); border: 1px dotted rgb(204, 204, 204); font-size: 92%;"><code>
    $a = get     -> version 1
    $a->change()
    $b = search  -> version 2
</code></pre>

<p>We have an old (and changed) version in $a. We know that a newer
version already exists in the DB, so we&#8217;ll get a conflict error
if we try to save $a.  What do we do?</p>

<h2>Proposal:</h2>

<p>I think my logic will look something like this:</p>

<pre style="color: black; padding: 10px; line-height: 1.4; overflow-x: auto; border-radius: 6px 6px 6px 6px; background: rgb(248, 248, 248); border: 1px dotted rgb(204, 204, 204); font-size: 92%;"><code>
    my ($class,$id,$version,$data) = @_;

    if (my $cached = $cache->{$id}) {

        return $cached
            if $version <= $cached->{version};

        return $cache->re_new($data);
            unless $cached->has_changed;

    }
    return $cache->{$id} = $class->new($data);
</code></pre>

<p>In other words, all instances of the object are always updated to
the latest version, EXCEPT if the current instance has been edited
and not yet saved. (Saving will throw a conflict error later on anyway).</p>

<p>Also, if you wanted to &#8220;detach&#8221; an object, then you could clone it and update it independently.</p>

<p>The only issue is that search results may contain a newer object
which no longer matches the search parameters.  Personally, I&#8217;m 
probably happy to live with this, but I probably need (a) a
default setting and (b) a dynamic flag which the user can use
to control this behaviour.</p>

<p>Thanks for getting to the bottom of this.</p>

<p>What do you think? See any obvious (or not-so-obvious) flaws?</p>

<p>(Also posted to <a href="http://perlmonks.org/?node_id=955346">PerlMonks</a> )</p>
]]>
        

    </content>
</entry>

<entry>
    <title>ElasticSearch::Sequence - a blazing fast ticket server</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/clinton_gormley/2011/10/elasticsearchsequence---a-blazing-fast-ticket-server.html" />
    <id>tag:blogs.perl.org,2011:/users/clinton_gormley//239.2334</id>

    <published>2011-10-22T13:00:19Z</published>
    <updated>2011-10-22T14:14:08Z</updated>

    <summary>I&apos;m considering ditching my RDBM for my next application and using ElasticSearch as my only data store. My home-grown framework uses unique IDs for all objects, which currently come from a MySQL auto-increment column, and my framework expects the unique...</summary>
    <author>
        <name>Clinton Gormley</name>
        
    </author>
    
        <category term="ElasticSearch" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="perlelasticsearch" label="perl elasticsearch" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/clinton_gormley/">
        <![CDATA[<p>I'm considering ditching my RDBM for my next application and using <a href="http://www.elasticsearch.org/">ElasticSearch</a> as my only data store.
</p>

<p>My home-grown framework uses unique IDs for all objects, which currently come from a MySQL auto-increment column, and my framework expects the unique ID to be an integer.
</p>

<p>ElasticSearch has its own unique auto-generated IDs, but:</p>
<ol>
	<li>they look like this '<code>KpSb_Jd_R56dH5Qx6TtxVA</code>' and I'd say are less human-readable than an integer</li>
<li>I would need to change a fair bit of legacy code to migrate to non-integer IDs</li>
</ol> 

<p>Initially I thought I could keep MySQL around as a ticket server, <a href="http://code.flickr.com/blog/2010/02/08/ticket-servers-distributed-unique-primary-keys-on-the-cheap/">as described by Flickr</a> but then I wondered if I could achieve the same thing by abusing <a href="http://www.elasticsearch.org/guide/reference/api/index_.html">ElasticSearch's built-in versioning</a>, allowing me to ditch MySQL completely, and give me a distributed ticket server with high availability into the bargain. </p>

<p>The logic is simple: when you index a document in ElasticSearch, it returns a new version number for the document, which is always incrementing and is guaranteed to be unique across the cluster.</p>

<pre style="color: black; padding: 10px; line-height: 1.4; overflow-x: auto; border-radius: 6px 6px 6px 6px; background: rgb(248, 248, 248); border: 1px dotted rgb(204, 204, 204); font-size: 92%;"><code>
# MAIL ID
curl -XPUT 'http://127.0.0.1:9200/sequence/sequence/mail_id?pretty=1' 

# {
#    "ok" : true,
#    "_index" : "sequence",
#    "_id" : "mail_id",
#    "_type" : "sequence",
#    "_version" : 1         # note: version number
# }

</code></pre>

<p>We can have multiple distinct sequences by storing a document with a different ID for each sequence.</p>

<pre style="color: black; padding: 10px; line-height: 1.4; overflow-x: auto; border-radius: 6px 6px 6px 6px; background: rgb(248, 248, 248); border: 1px dotted rgb(204, 204, 204); font-size: 92%;"><code>
curl -XPUT 'http://127.0.0.1:9200/sequence/sequence/other_id?pretty=1' 

# {
#    "ok" : true,
#    "_index" : "sequence",
#    "_id" : "other_id",    # note: different ID
#    "_type" : "sequence",
#    "_version" : 1
# }

</code></pre>

<p>ElasticSearch enables a bunch of features by default, which are very useful for using it as a document store and a full text search server, but aren't relevant in this situation, and will just slow it down. </p>
<p>The amount of data will tiny, so our index only needs one primary shard, not the 5 that are created by default in ElasticSearch.  But for high-availability purposes, we'd like this shard to be replicated across all nodes in our cluster. So the index settings look like this:</p>
<pre style="color: black; padding: 10px; line-height: 1.4; overflow-x: auto; border-radius: 6px 6px 6px 6px; background: rgb(248, 248, 248); border: 1px dotted rgb(204, 204, 204); font-size: 92%;"><code>
   "settings" : {
      "number_of_shards" : 1,           
      "auto_expand_replicas" : "0-all"  
   },

</code></pre>

<p>For the type mapping (like a schema in a database) we want to turn off the <code>_all</code> field and <code>_source</code> field, disable <c>_type</c> indexing, and disable indexing for the document (which is only ever going to be an empty hashref): </p>
<pre style="color: black; padding: 10px; line-height: 1.4; overflow-x: auto; border-radius: 6px 6px 6px 6px; background: rgb(248, 248, 248); border: 1px dotted rgb(204, 204, 204); font-size: 92%;"><code>
   "sequence" : {
      "_source" : { "enabled" : 0 },
      "_all"    : { "enabled" : 0 },
      "_type"   : { "index" : "no" },
      "enabled" : 0
   }

</code></pre>

<p>So the full command to create the index and set the type mapping looks like this:</p>
<pre style="color: black; padding: 10px; line-height: 1.4; overflow-x: auto; border-radius: 6px 6px 6px 6px; background: rgb(248, 248, 248); border: 1px dotted rgb(204, 204, 204); font-size: 92%;"><code>
curl -XPUT 'http://127.0.0.1:9200/sequence/?pretty=1'  -d '
{
   "settings" : {
      "number_of_shards"     : 1,           
      "auto_expand_replicas" : "0-all"  
   },
   "mappings" : {
      "sequence" : {
         "_source" : { "enabled" : 0 },
         "_all"    : { "enabled" : 0 },
         "_type"   : { "index" : "no" },
         "enabled" : 0
      }
   }
}
'

</code></pre>

<p>Requesting a single ID (indexing the doc to get a new version) at a time is going to be relatively slow, as there is a fair bit of HTTP latency per request.  This is fine for normal use, but our ticket server has to be super fast.</p>
<p>So instead, I'm going to request several new version numbers at once using the <a href="http://www.elasticsearch.org/guide/reference/api/bulk.html">bulk API</a>, and buffer them.</p>
<pre style="color: black; padding: 10px; line-height: 1.4; overflow-x: auto; border-radius: 6px 6px 6px 6px; background: rgb(248, 248, 248); border: 1px dotted rgb(204, 204, 204); font-size: 92%;"><code>
curl -XPOST 'http://127.0.0.1:9200/_bulk?pretty=1'  -d '
{"index":{"_index":"sequence","_type":"sequence","_id":"mail_id"}}
{}
{"index":{"_index":"sequence","_type":"sequence","_id":"mail_id"}}
{}
[*** SNIP ***]
'

# {
#    "items" : [
#       {
#          "index" : {
#             "ok" : true,
#             "_index" : "sequence",
#             "_id" : "mail_id",
#             "_type" : "sequence",
#             "_version" : 1
#          }
#       },
#       {
#          "index" : {
#             "ok" : true,
#             "_index" : "sequence",
#             "_id" : "mail_id",
#             "_type" : "sequence",
#             "_version" : 2
#          }
#       },
[*** SNIP ***]

</code></pre>

<h2>ElasticSearchX::Sequence</h2>
<p>I've wrapped up all of the above and released it as <a href="https://metacpan.org/module/DRTECH/ElasticSearchX-Sequence-0.01/lib/ElasticSearchX/Sequence.pm">ElasticSearchX::Sequence</a></p>
<pre style="color: black; padding: 10px; line-height: 1.4; overflow-x: auto; border-radius: 6px 6px 6px 6px; background: rgb(248, 248, 248); border: 1px dotted rgb(204, 204, 204); font-size: 92%;"><code>
use ElasticSearch();
use ElasticSearchX::Sequence();
 
my $es  = ElasticSearch->new();
my $seq = ElasticSearchX::Sequence->new( es => $es );
 
$seq->bootstrap();   # setup the index and type mapping
 
my $it  = $seq->sequence('mail_id');
 
my $mail_id = $it->next;

</code></pre>

<h2>Benchmarks</h2>
<p>I wrote a small <a href="https://metacpan.org/source/DRTECH/ElasticSearchX-Sequence-0.01/benchmark/benchmark.pl">benchmark script</a> which compares:</p>

<ol>
    <li>MySQL, using the ticket method described by Flickr</li>
    <li>this module, using the httptiny backend</li>
    <li>this module, using the curl backend</li>
    <li>this module, using the curl backend but only requesting blocks of 10 IDs at a time</li>
</ol>    

<p>The results (run on my laptop) are pretty startling:</p>
<pre style="color: black; padding: 10px; line-height: 1.4; overflow-x: auto; border-radius: 6px 6px 6px 6px; background: rgb(248, 248, 248); border: 1px dotted rgb(204, 204, 204); font-size: 92%;"><code>
               Rate es_curl_10  db_ticket    es_tiny    es_curl
es_curl_10  38760/s         --       -48%       -55%       -72%
db_ticket   74627/s        93%         --       -13%       -47%
es_tiny     85470/s       121%        15%         --       -39%
es_curl    140845/s       263%        89%        65%         --

</pre></code>

<p>If you are already using ElasticSearch as your search server, (and if you're not, you should be - it's fantastic), and you're currently using your DB as a ticket server, I'd consider moving this function over to ElasticSearch instead.</p>

]]>
        
    </content>
</entry>

<entry>
    <title>Perlish concise query syntax for ElasticSearch</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/clinton_gormley/2011/07/perlish-concise-query-syntax-for-elasticsearch.html" />
    <id>tag:blogs.perl.org,2011:/users/clinton_gormley//239.1941</id>

    <published>2011-07-03T21:43:08Z</published>
    <updated>2011-07-04T08:23:05Z</updated>

    <summary>Announcing ElasticSearch::SearchBuilder In Perl, we like to put important things first, so the ElasticSearch query language has always felt a bit wrong to me. For instance, to find docs where the content field contains the text keywords: # op field...</summary>
    <author>
        <name>Clinton Gormley</name>
        
    </author>
    
        <category term="ElasticSearch" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="elasticsearchperlfulltextsearch" label="elasticsearch perl fulltextsearch" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/clinton_gormley/">
        <![CDATA[<h2>Announcing ElasticSearch::SearchBuilder</h2>

<p>In Perl, we like to put important things first, so the ElasticSearch query language has always felt a bit wrong to me. For instance, to find docs where the <code>content</code> field contains the text <code>keywords</code>:</p>

<p><code><pre>
    # op        field       value
    { text => { content => 'keywords' } }
</pre></code>    </p>

<p>To me, the important part of this is the field that we&#8217;re operating on, so this feels more natural:</p>

<p><code><pre>
    # field        op       value
    { content => { text => 'keywords' }}
</pre></code>    </p>

<p>So, in the spirit of <a href="http://beta.metacpan.org/module/SQL::Abstract">SQL::Abstract</a> I am proud to announce <a href="http://beta.metacpan.org/module/ElasticSearch::SearchBuilder">ElasticSearch::SearchBuilder</a>, which is tightly integrated into the latest <a href="http://beta.metacpan.org/module/ElasticSearch">ElasticSearch.pm</a> version 0.38.</p>

<p>Any method which takes a <code>query</code> or <code>filter</code> param (eg search() now also accepts a <code>queryb</code> or <code>filterb</code> parameter instead, whose value will be parsed via SearchBuilder:</p>

<p>Do a full text search of the  <code>_all</code> field for  <code>'my keywords'</code>:
<code><pre>
    $es->search( queryb=> 'my keywords' );
</code></pre></p>

<p>Find docs whose title field contains the text <code>apple</code> but not  <code>orange</code>, whose  <code>status</code> field contains the value  <code>active</code>:</p>

<p><code><pre>
$es->search(
    queryb => {
        title => {
            '='  => 'apple',
            '!=' => 'orange'
        },
        -filter => {
            status => 'active'
        }
    }
)
</pre></code></p>

<h2>If you have suggestions to improve the API or the documentation, please get in touch.</h2>

<p><a href="http://search.metacpan.org:8000/">You can try out ElasticSearch::SearchBuilder here</a>.</p>

<p>And finally, a more complex example, to demonstrate how much more concisely you can write queries:</p>

<p>Out of all docs published in 2010 and tagged with either &#8220;perl&#8221; or &#8220;ruby&#8221;, find those whose <code>title</code> field contains&#8221;my keywords&#8221;, in which case consider this doc to be particularly relevant (<code>boost: 2</code>) or the <code>title</code> field is missing but the <code>body</code> field contains <code>'my keywords'</code>:</p>

<p><code><pre>
$es->search(
    queryb => {
        -or => [
            {
                title => {
                    '=' => {
                        query => 'my keywords',
                        boost => 2
                }}
            },
            {
                body    => 'my_keywords',
                -filter => {
                    -missing => 'title'
                }
            },
        ],
        -filter => {
            tags => [ 'perl','ruby' ],
            date => {
                '>=' => '2010-01-01',
                '&lt;'  => '2011-01-01'
            },
        }
    }
)
</code></pre></p>

<p>is the equivalent of:</p>

<p><code><pre>
    $es->search(
        query => {
            filtered => {
                filter => {
                    and => [
                        { 
                            terms => { 
                                tags => ["perl", "ruby"] 
                            } 
                        },
                        { 
                            numeric<em>range => { 
                                date => { 
                                    gte => "2010-01-01", 
                                    lt => "2011-01-01" 
                                }
                            }
                        }
                    ],
                },
                query  => {
                    bool => {
                        should => [
                            { 
                                text => { 
                                    title => { 
                                        boost => 2, 
                                        query => "my keywords" 
                                    } 
                                } 
                            },
                            { 
                                filtered => {
                                    filter => { 
                                        missing => { 
                                            field => "title" 
                                        } 
                                    },
                                    query  => { 
                                        text => { 
                                            body => "my</em>keywords" 
                                        } 
                                    },
                                }
                            }
                        ],
                    }
                }
            }
        }
    )
</code></pre></p>

<p>Which looks better to you?</p>
]]>
        

    </content>
</entry>

<entry>
    <title>ElasticSearch.pm v0.37 released, with a small breaking change</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/clinton_gormley/2011/05/elasticsearchpm-v037-released-with-a-small-breaking-change.html" />
    <id>tag:blogs.perl.org,2011:/users/clinton_gormley//239.1716</id>

    <published>2011-05-01T12:14:04Z</published>
    <updated>2011-05-01T12:17:52Z</updated>

    <summary>Just released ElasticSearch.pm v 0.37 which has a small breaking change. In version 0.36, $scrolled_search-&gt;next() returned the next $size results. Now, by default it returns the next one result, which makes it easier to write: while ( my $result =...</summary>
    <author>
        <name>Clinton Gormley</name>
        
    </author>
    
        <category term="ElasticSearch" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/clinton_gormley/">
        <![CDATA[<p>Just released <a href="http://search.cpan.org/~drtech/ElasticSearch/">ElasticSearch.pm v 0.37</a> which has a small breaking change.</p>

<p>In version 0.36, <code>$scrolled_search->next()</code> returned the next <code>$size</code> results.  Now, by default it returns the next one result, which makes it easier to write:</p>

<pre><code>

<p>     while ( my $result = $scroller->next ) {...}</p>

<p><br />
</code></pre></p>]]>
        
    </content>
</entry>

<entry>
    <title>ElasticSearch.pm v0.36, now with extra sugar</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/clinton_gormley/2011/04/elasticsearchpm-v036-now-with-extra-sugar.html" />
    <id>tag:blogs.perl.org,2011:/users/clinton_gormley//239.1689</id>

    <published>2011-04-24T19:16:16Z</published>
    <updated>2011-04-25T08:17:38Z</updated>

    <summary>ElasticSearch v 0.16.0 was released yesterday with a long list of new features, enhancements and bug fixes. ElasticSearch.pm v 0.36 is on its way to CPAN as we speak. Besides adding support for the new stuff in v 0.16, I&apos;ve...</summary>
    <author>
        <name>Clinton Gormley</name>
        
    </author>
    
        <category term="ElasticSearch" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/clinton_gormley/">
        <![CDATA[<p><a href="http://www.elasticsearch.org/blog/2011/04/23/0.16.0-released.html">ElasticSearch v 0.16.0</a> was released yesterday with a long list of <a href="http://www.elasticsearch.org/download/2011/04/23/0.16.0.html">new features, enhancements and bug fixes</a>.</p>

<p><a href="http://search.cpan.org/~drtech/ElasticSearch/">ElasticSearch.pm v 0.36</a> is on its way to <span class="caps">CPAN </span>as we speak.</p>

<p>Besides adding support for the new stuff in v 0.16, I've also added a few features:</p>

<h2>scrolled_search()</h2>

<p>It is possible to scroll through a long list of results in ElasticSearch, but this required a bit of repetitive code, which is now nicely packaged up in <code>scrolled_search</code>.  So you can do:</p>



<pre><code>

    $scroll = $es-&gt;scrolled_search( 
        search_type =&gt; 'scan',   # efficient search type for scrolling
        scroll =&gt; '2m', # cache search results for the next 2 minutes
    );

    while (my $doc = $scroll-&gt;next(1)) {
         # do something
    }


</code></pre>



<h2>reindex()</h2>

<p>Users on the mailing list are always asking how to reindex their data, either from one index to another on the same cluster, or from one cluster to another.</p>

<p>Now, <code>scrolled_search()</code> and <code>reindex()</code> make it easy to do this in a single command.</p>

<p>For example:</p>

<p>To copy the ElasticSearch website index locally, you could do:</p>



<pre><code>

    my $local = ElasticSearch-&gt;new(
        servers =&gt; 'localhost:9200'
    );
    my $remote = ElasticSearch-&gt;new(
        servers    =&gt; 'search.elasticsearch.org:80',
        no_refresh =&gt; 1
    );

    my $source = $remote-&gt;scrolled_search(
        search_type =&gt; 'scan',
        scroll      =&gt; '5m'
    );
    $local-&gt;reindex(source=&gt;$source);

</code></pre>



<p>To copy one local index to another, make the title upper case,<br />
exclude docs of type <code>boring</code>, and to preserve the version numbers<br />
from the original index:</p>



<pre><code>

    my $source = $es-&gt;scrolled_search(
        index       =&gt; 'old_index',
        search_type =&gt; 'scan',
        scroll      =&gt; '5m',
        version     =&gt; 1
    );

    $es-&gt;reindex(
        source      =&gt; $source,
        dest_index  =&gt; 'new_index',
        transform   =&gt; sub {
            my $doc = shift;
            return if $doc-&gt;{_type} eq 'boring';
            $doc-&gt;{_source}{title} = uc( $doc-&gt;{_source}{title} );
            return $doc;
        }
    );

</code></pre>



<h2>no_refresh</h2>

<p>By default, ElasticSearch.pm retrieves a list of live nodes from the ElasticSearch cluster, and round-robins around them.</p>

<p>However, if you are talking to a remote ES cluster, or a cluster behind a proxy, this may not be desirable behaviour. The <code>no_refresh</code> parameter turns off the discovery of live nodes. Instead <span class="caps">ES.</span>pm round robins through the list of servers passed to <code>new()</code>, and can fail over between this list:</p>



<pre><code>

    my $es = ElasticSearch-&gt;new(
        servers =&gt; ['es1.search.com:80', 'es2.search.com:80'],
        no_refresh =&gt; 1
    );


</code></pre>]]>
        
    </content>
</entry>

<entry>
    <title>Firefox 4 bug adding inline script tag - last tag doesn&apos;t fire</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/clinton_gormley/2011/03/firefox-4-bug-adding-inline-script-tag---last-tag-doesnt-fire.html" />
    <id>tag:blogs.perl.org,2011:/users/clinton_gormley//239.1592</id>

    <published>2011-03-25T19:02:45Z</published>
    <updated>2011-03-25T20:26:18Z</updated>

    <summary><![CDATA[I came across a nasty bug in Firefox 4 today, which will break a lot of AJAX. All script tags fire, except for the last one, eg: function test(num) { var temp = document.createElement('div'); var src = '&lt;script type=&quot;text/javascript&quot;&gt;' +...]]></summary>
    <author>
        <name>Clinton Gormley</name>
        
    </author>
    
    <category term="ff4firefoxjavascript" label="FF4 Firefox Javascript" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/clinton_gormley/">
        <![CDATA[<p>I came across a nasty bug in Firefox 4 today, which will break a lot of <span class="caps">AJAX.</span></p>

<p>All script tags fire, except for the last one, eg:</p>

<code>


<pre>

function test(num) {
    var temp = document.createElement('div');
    var src  = '&lt;script type=&quot;text/javascript&quot;&gt;' +
                       'alert(&quot;Hi&quot;);'  + 
                    '&lt;' + '/script&gt;';
    if (num === 2) {
        src = src + '&lt;script&gt;&lt;/' +'script&gt;';
    }
    temp.innerHTML = src;
    body.appendChild(temp);
}

</pre>


</code><br />
<p>Calling <code>test(1)</code> will do nothing, while <code>test(2)</code> will produce an alert popup.</p>

<p>Reported as bug: <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=645115">https://bugzilla.mozilla.org/show_bug.cgi?id=645115</a>]]>
        
    </content>
</entry>

<entry>
    <title>Lazyweb: ElasticSearch proxy </title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/clinton_gormley/2011/02/lazyweb-elasticsearch-proxy.html" />
    <id>tag:blogs.perl.org,2011:/users/clinton_gormley//239.1459</id>

    <published>2011-02-11T11:51:49Z</published>
    <updated>2011-02-11T12:08:10Z</updated>

    <summary>Hi all We get a lot of people who want to use javascript to talk to their ElasticSearch server, directly from the browser. This poses a problem, as ElasticSearch doesn&apos;t offer any authentication, or request filtering. I&apos;d like to write...</summary>
    <author>
        <name>Clinton Gormley</name>
        
    </author>
    
    <category term="perlelasticsearchdancerproxyauthentication" label="perl elasticsearch dancer proxy authentication" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/clinton_gormley/">
        <![CDATA[<p>Hi all</p>

<p>We get a lot of people who want to use javascript to talk to their <a href="http://www.elasticsearch.org">ElasticSearch</a> server, directly from the browser.</p>

<p>This poses a problem, as ElasticSearch doesn't offer any authentication, or request filtering.</p>

<p>I'd like to write <strong>ElasticSearch::Proxy</strong>, which would be configurable to:<br />
<ul><br />
	<li>allow restriction on GET/HEAD/POST/PUT/DELETE requests</li><br />
<li>parse the incoming JSON request, filter out anything that shouldn't be allowed, and then forward the request on to the ES server</li><br />
<li>allow authenticated requests, with different permissions</li><br />
</ul></p>

<p>With the module, I'd like to provide various ready made server configurations, ie you should be able to plug it into mod_perl, dancer, whatever...</p>

<p>I'm only familiar with mod_perl - haven't used any of the other frameworks.</p>

<p><strong>LAZYWEB</strong>: What webservers should I target, and are there any existing modules which may be useful to use with the above?</p>

<p>thanks</p>

<p>clint</p>]]>
        
    </content>
</entry>

<entry>
    <title>ElasticSearch.pm gets big performance boost </title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/clinton_gormley/2010/10/elasticsearchpm-gets-big-performance-boost.html" />
    <id>tag:blogs.perl.org,2010:/users/clinton_gormley//239.1119</id>

    <published>2010-10-19T17:35:00Z</published>
    <updated>2010-10-20T18:31:10Z</updated>

    <summary>ElasticSearch version 0.12 is out today along with some nice new features. However, the thing I&apos;m most excited about is that ElasticSearch.pm v 0.26 is also out and has support for bulk indexing and pluggable backends, both of which add...</summary>
    <author>
        <name>Clinton Gormley</name>
        
    </author>
    
        <category term="ElasticSearch" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="elasticsearchperllucenefulltextsearchthrift" label="elasticsearch perl lucene fulltextsearch thrift" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/clinton_gormley/">
        <![CDATA[<p><a href="http://www.elasticsearch.com/download/">ElasticSearch version 0.12</a> is out today along with some nice <a href="http://www.elasticsearch.com/blog/">new features</a>. </p>

<p>However, the thing I'm most excited about is that <a href="http://search.cpan.org/~drtech/ElasticSearch/">ElasticSearch.pm v 0.26</a> is also out and has support for <b>bulk indexing</b> and <b>pluggable backends</b>, both of which add a significant performance boost.</p>

<h2>Pluggable backends</h2>

<p>I've factored out the parts which actually talk to the ElasticSearch server into the ElasticSearch::Transport module, which acts as a base class for ElasticSearch::Transport::HTTP (which uses <span class="caps">LWP</span>), ::HTTPLite (which uses, not surprisingly, <span class="caps">HTTP</span>::Lite) and ::Thrift, which uses the <a href="http://en.wikipedia.org/wiki/Thrift_%28protocol%29">Thrift protocol</a></p>

<p>I expected Thrift to be the big winner, but it turns out that the generated code is dog-slow. However, <span class="caps">HTTP</span>::Lite is about 20% faster than <span class="caps">LWP</span>:</p>

<pre><code>   httplite   :  63 seconds, 951 tps
   http       :  79 seconds, 759 tps
   thrift     :  690 seconds, 87 tps</code></pre>

<h2>Bulk indexing</h2>

<p>Since version 0.11, ElasticSearch has had a <code>bulk</code> operation, which can take a stream of <code>index</code>, <code>create</code> and <code>delete</code> statements in a single request.</p>

<p>For instance, you could do:</p>

<pre><code>   $es-&gt;bulk(
        { index =&gt; {
            index =&gt; 'foo', type=&gt;'bar', id=&gt;1, data =&gt; { foo =&gt; 'bar' }
        }},
        { create =&gt; { 
            index =&gt; 'foo', type=&gt;'bar', id=&gt;2, data =&gt; { foo =&gt; 'bar' }
        }},
        { delete =&gt; { 
            index =&gt; 'foo', type=&gt;'bar', id=&gt;1
        }}
    );</code></pre>

<p>The number of actions you can pass in depends on how much memory you have, both on the client and the server, and how big your documents are. </p>

<p>I tried tranches of 1,000, 5,000 and 10,000 documents at a time, the results were very similar. </p>

<p>All tranches and all transports averaged about 7.5 seconds or <b>8,000 transactions per second</b>!  These are small documents, so I would be surprised to achieve this rate in the real world, but a 10x improvement is phenomenal.</p>

<p><i>(These benchmarks were run on my laptop with a single ElasticSearch node, over 59,950 documents (<code>{ text =&gt; $string}</code>) whose string value averaged 310 characters in length and consisted of real world text, not randomly generated gibberish. )</i></p>

<h2>Example script</h2>

<p><i>(This is now included in the examples directory of ElasticSearch.pm)</i></p>

<p>Finally, here is a simple example script which downloads from github all of the issues open against ElasticSearch, indexes them, and provides a simple command line interface to searching for them:</p>

<pre><code>   #!/user/bin/perl

    use strict;
    use warnings;
    use JSON::XS();
    use ElasticSearch();
    use ElasticSearch::Util qw(filter_keywords);
    use HTTP::Lite();
    use v5.12.0;

    my $url = 'http://github.com/api'
        . '/v2/json/issues/list/elasticsearch/elasticsearch/open';

    my $json = JSON::XS-&gt;new-&gt;utf8(1)-&gt;pretty(1);
    my $es = ElasticSearch-&gt;new( servers =&gt; '127.0.0.1:9200' );

    # Download issues list from github
    my $http = HTTP::Lite-&gt;new();
    my $req  = $http-&gt;request($url);
    die &quot;couldn't retrieve issues list&quot; unless $req &amp;&amp; $req == 200;

    my $issues = $json-&gt;decode( $http-&gt;body )-&gt;{issues};

    # delete index in case it already exists, then create the index
    eval { $es-&gt;delete_index( index =&gt; 'issues' ) };
    $es-&gt;create_index( index =&gt; 'issues' );

    # prepare issues for indexing
    my $id = 1;
    my @docs;
    for (@$issues) {

        # each doc needs an index, a type, an ID and data
        my $doc
            = { index =&gt; 'issues', type =&gt; 'entry', id =&gt; $id++, data =&gt; $_ };

        # we want to 'create' each doc (as opposed to 'index' or 'delete')
        push @docs, { create =&gt; $doc };
    }

    # bulk index docs
    my $res = $es-&gt;bulk( \@docs );
    if ( $res-&gt;{errors} ) {
        die &quot;Bulk index had issues: &quot; . $json-&gt;encode( $res-&gt;{errors} );
    }

    # force all changes to be refreshed immediately
    $es-&gt;refresh_index();

    say &quot;Total issues indexed: &quot; . $es-&gt;count( match_all =&gt; {} )-&gt;{count};

    # search for issues
    while (1) {
        print &quot;\nEnter keywords to search for, or an issue ID:\n  &gt; &quot;;
        my $keywords = &lt;&gt;;
        chomp $keywords;
        last unless $keywords;

        # if an issue ID, retrieve the doc and display it
        if ( $keywords =~ /^\d+$/ ) {
            my $doc = $es-&gt;get(
                index =&gt; 'issues',
                type  =&gt; 'entry',
                id    =&gt; $keywords
            )-&gt;{_source};
            for my $key ( sort keys %$doc ) {
                my $val = $doc-&gt;{$key} // '';
                say &quot;$key: $val&quot;;
            }
            say '-' x 60;
            next;
        }

        # otherwise, we're searching for keywords, so filter
        # them to make sure the keywords don't include special chars
        $keywords = filter_keywords($keywords);

        my $result
            = $es-&gt;search( query =&gt; { field =&gt; { _all =&gt; $keywords } } );
        say &quot;Total results found: &quot; . $result-&gt;{hits}{total};
        printf( &quot; - %02d: %s\n&quot;, $_-&gt;{_id}, $_-&gt;{_source}{title} )
            for @{ $result-&gt;{hits}{hits} };
    }
 </code></pre>]]>
        
    </content>
</entry>

<entry>
    <title>ElasticSearch::AnyEvent pre-release available on github</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/clinton_gormley/2010/08/elasticsearchanyevent-pre-release-available-on-github.html" />
    <id>tag:blogs.perl.org,2010:/users/clinton_gormley//239.947</id>

    <published>2010-08-26T11:40:30Z</published>
    <updated>2010-08-26T11:49:28Z</updated>

    <summary>I&apos;ve just pushed ElasticSearch::AnyEvent - this brings async requests to the Perl ElasticSearch client. It is still lacking proper documentation and any tests, which will soon follow. It is available on the anyevent branch on github This is my first...</summary>
    <author>
        <name>Clinton Gormley</name>
        
    </author>
    
        <category term="AnyEvent" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="ElasticSearch" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="elasticsearchperllucenefulltextsearchanyeventasync" label="elasticsearch perl lucene fulltextsearch anyevent async" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/clinton_gormley/">
        <![CDATA[<p>I've just pushed ElasticSearch::AnyEvent  - this brings async requests to the Perl ElasticSearch client. It is still lacking proper documentation and any tests, which will soon follow. </p>

<p>It is available on the <a href="http://github.com/clintongormley/ElasticSearch.pm/tree/anyevent">anyevent branch on github</a></p>

<p>This is my first foray into async programming in Perl, so I'd appreciate feedback on the <span class="caps">API </span>and code.</p>

<p>Briefly, it can be used as follows:</p>

<pre><code>use ElasticSearch::AnyEvent();
 my $es = ElasticSearch::AnyEvent-&gt;new(servers=&gt;'127.0.0.1:9200');

 # Blocking

     my $cv = $es-&gt;current_server;
     print $cv-&gt;recv;

     # or

     print $es-&gt;current_server-&gt;recv

 # Callback

     my $cv = $es-&gt;current_server;
     $es-&gt;cb(sub {
         my $result = shift || die $@;
         ....
     });

 # Context

     my $cv = $es-&gt;current_server;
     undef $cv;                      # cancels

     $es-&gt;refresh_servers;           # fire-and-forget
     start_event_loop();

     {
        my $cv1 = $es-&gt;foo;
        my $cv2 = $es-&gt;foo;
        $cv2-&gt;cb(...);
     }
     # $cv1 is cancelled
     # $cv2 is backgrounded / fire-and-forget

 # Multitask:

     $es-&gt;multi_task(
         action      =&gt; 'index',
         pull_queue  =&gt; sub {
             # returns a list of HASHes to be passed to $es-&gt;$action(...)
             # can return (eg) 1000 at a time
         },
         on_success  =&gt; sub {                # optional
             my ($args,$result) = @_;
             ...
         },
         on_error    =&gt; sub {                # optional
             my ($error,$args,$queue) = @_;
             ...
         },
     )-&gt;recv || die &quot;Error&quot;;</code></pre>]]>
        
    </content>
</entry>

<entry>
    <title>ElasticSearch at YAPC::EU 2010 - slides live</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/clinton_gormley/2010/08/elasticsearch-at-yapceu-2010---slides-live.html" />
    <id>tag:blogs.perl.org,2010:/users/clinton_gormley//239.846</id>

    <published>2010-08-06T14:17:05Z</published>
    <updated>2010-08-06T14:19:04Z</updated>

    <summary>My slides from my talk at YAPC::EU 2010, &quot;ElasticSearch - You know, for search&quot;, are now available here: http://clintongormley.github.com/ElasticSearch.pm/ElasticSearch_YAPC-EU_2010/...</summary>
    <author>
        <name>Clinton Gormley</name>
        
    </author>
    
        <category term="ElasticSearch" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="elasticsearchperllucenefulltextsearch" label="elasticsearch perl lucene fulltextsearch" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/clinton_gormley/">
        <![CDATA[<p>My slides from my talk at <span class="caps">YAPC</span>::EU 2010, "ElasticSearch - You know, for search", are now available here:</p>

<p><a href="http://clintongormley.github.com/ElasticSearch.pm/ElasticSearch_YAPC-EU_2010/">http://clintongormley.github.com/ElasticSearch.pm/ElasticSearch_YAPC-EU_2010/</a></p>]]>
        
    </content>
</entry>

<entry>
    <title>ElasticSearch at YAPC::EU</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/clinton_gormley/2010/08/elasticsearch-at-yapceu.html" />
    <id>tag:blogs.perl.org,2010:/users/clinton_gormley//239.816</id>

    <published>2010-08-02T12:38:02Z</published>
    <updated>2010-08-02T12:39:43Z</updated>

    <summary>If you&apos;re lucky enough to be coming to YAPC::EU and you&apos;re interested in ElasticSearch, come along to my talk I&apos;ll be giving a brief introduction to what it is, the benefits, and how to talk to ElasticSearch with Perl...</summary>
    <author>
        <name>Clinton Gormley</name>
        
    </author>
    
        <category term="ElasticSearch" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="perlelasticsearchyapceu" label="Perl ElasticSearch YAPC::EU" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/clinton_gormley/">
        <![CDATA[<p>If you're lucky enough to be coming to <span class="caps">YAPC</span>::EU and you're interested in ElasticSearch, come along to <a href="http://conferences.yapceurope.org/ye2010/talk/2907">my talk</a></p>

<p>I'll be giving a brief introduction to what it is, the benefits, and how to talk to ElasticSearch with Perl</p>]]>
        
    </content>
</entry>

<entry>
    <title>ElasticSearch gets facets, scripting and better performance</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/clinton_gormley/2010/08/elasticsearch-gets-facets-scripting-and-better-performance.html" />
    <id>tag:blogs.perl.org,2010:/users/clinton_gormley//239.815</id>

    <published>2010-08-02T12:36:42Z</published>
    <updated>2010-08-02T12:37:22Z</updated>

    <summary>I&apos;ve just released ElasticSearch.pm version 0.18 which supports ElasticSearch version 0.9.0 Some of the major new features in ElasticSearch 0.9.0 are (or see the Detailed release notes ) Facets support ES now supports a wide range of facets ie aggregated...</summary>
    <author>
        <name>Clinton Gormley</name>
        
    </author>
    
        <category term="ElasticSearch" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="perlelasticsearchcpan" label="Perl ElasticSearch CPAN" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/clinton_gormley/">
        <![CDATA[<p>I've just released <a href="http://search.cpan.org/~drtech/ElasticSearch-0.18/lib/ElasticSearch.pm">ElasticSearch.pm version 0.18</a> which supports <a href="http://www.elasticsearch.com/blog/">ElasticSearch version 0.9.0</a></p>

<p>Some of the major new features in ElasticSearch 0.9.0 are (or see the <a href="http://wiki.github.com/elasticsearch/elasticsearch/release-notes">Detailed release notes</a> )</p>

<h2><a href="http://www.elasticsearch.com/docs/elasticsearch/rest_api/search/facets">Facets support</a></h2>

<p>ES now supports a wide range of facets ie aggregated counts based on your current search:</p>

<h3>* <a href="http://www.elasticsearch.com/docs/elasticsearch/rest_api/search/facets/terms_facet/">Term facets</a></h3>

<p>Returns the most common terms associated with the current query, which eg could be used to populate auto-complete fields</p>

<h3>* <a href="http://www.elasticsearch.com/docs/elasticsearch/rest_api/search/facets/statistical_facet/">Statistical facets</a></h3>

<p>Returns statistical information on numeric fields, eg count, total, min, max, variance, sum of squares, standard deviation</p>

<h3>* <a href="http://www.elasticsearch.com/docs/elasticsearch/rest_api/search/facets/histogram_facet/">Histogram facets</a></h3>

<p>Breaks the value of a numeric or date field up into buckets, which can be used to draw histograms, eg the number of posts per week.</p>

<h2>Scripting support</h2>

<p>The <a href="http://mvel.codehaus.org/">mvel</a> dynamic scripting language can be used for:</p>

<h3>* <a href="http://www.elasticsearch.com/docs/elasticsearch/rest_api/search/script_fields">Script fields</a></h3>

<p>script_fields can use stored values to return dynamically generated values</p>

<h3>* <a href="http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/custom_score_query">Custom score</a></h3>

<p>custom_score queries can be used to order the results by some value other than just relevance, eg you could use a publish_date field to increase the relevance of more recent results</p>

<h2>Improved gateway recovery</h2>

<p>Restarting a node used to mean that it had to retrieve all the data either from the gateway (permanent index store) or from the primary shard.Now, the node can reuse existing index files, which greatly speeds up the recovery process.</p>

<p>Also included is the ability to control when the initial recovery will happen as a factor of the number of nodes in the cluster and time. </p>

<p>This does require you to reindex your data when upgrading from 0.8.0 to 0.9.0.</p>

<h2>Stability, Bug Squashing, and Memory Usage Improvements</h2>

<p>A lot of work has gone into improved stability, better memory management, and major bug squashing. ElasticSearch is being used by several companies to index very large amount of data with large cluster size successfully with snapshot versions of 0.9.</p>]]>
        
    </content>
</entry>

<entry>
    <title>Released ElasticSearch 0.12 and Alien::ElasticSearch 0.10</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/clinton_gormley/2010/04/released-elasticsearch-012-and-alienelasticsearch-010.html" />
    <id>tag:blogs.perl.org,2010:/users/clinton_gormley//239.489</id>

    <published>2010-04-16T17:34:22Z</published>
    <updated>2010-04-16T17:44:51Z</updated>

    <summary>Just released the above two modules to support the ElasticSearch server version 0.6.0, which you can read about here: http://www.elasticsearch.com/blog/2010/04/09/0.6.0_released.html New features include: support for the _all field, which allows you to search across all indexed fields fuzzy_like_this and more_like_this...</summary>
    <author>
        <name>Clinton Gormley</name>
        
    </author>
    
        <category term="ElasticSearch" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="elasticsearchperllucenefulltextsearch" label="elasticsearch perl lucene fulltextsearch" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/clinton_gormley/">
        <![CDATA[<p>Just released the above two modules to support the ElasticSearch server version 0.6.0, which you can read about here: <a href="http://www.elasticsearch.com/blog/2010/04/09/0.6.0_released.html">http://www.elasticsearch.com/blog/2010/04/09/0.6.0_released.html</a></p>

<p>New features include:</p>

<ul>
<li>support for the <code>_all</code> field, which allows you to search across all indexed fields</li>
<li><code>fuzzy_like_this</code> and <code>more_like_this</code> queries</li>
<li>simpler <code>range</code> queries using Perl's <code>gt</code>, <code>gte</code>, <code>lt</code> and <code>lte</code> operators</li>
<li>Index aliases</li>
<li>attachment indexing</li>
<li>search highlighting</li>
<li>... and more</li>
</ul>

<p>The ElasticSearch API has gone through a big rename, moving from camelCase to the more Perlish underscore_style.</p>

<p>You can download them from CPAN (as soon as they arrive) or here:</p>

<ul>
<li>ElasticSearch: <a href="http://github.com/clintongormley/ElasticSearch.pm/downloads">http://github.com/clintongormley/ElasticSearch.pm/downloads</a></li>
<li>Alien::ElasticSearch: <a href="http://github.com/clintongormley/Alien-ElasticSearch/downloads">http://github.com/clintongormley/Alien-ElasticSearch/downloads</a></li>
</ul>
]]>
        

    </content>
</entry>

<entry>
    <title>Alien::ElasticSearch 0.05 and ElasticSearch.pm 0.03</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/clinton_gormley/2010/02/alienelasticsearch-005-and-elasticsearchpm-003.html" />
    <id>tag:blogs.perl.org,2010:/users/clinton_gormley//239.291</id>

    <published>2010-02-21T18:30:28Z</published>
    <updated>2010-02-21T18:36:59Z</updated>

    <summary>Just released Alien::ElasticSearch (version 0.05 on its way to a CPAN near you). This downloads, builds and installs the latest version of ElasticSearch from GitHub, which makes a live server available for automated testing of.... ElasticSearch.pm v 0.03, also on...</summary>
    <author>
        <name>Clinton Gormley</name>
        
    </author>
    
    <category term="elasticsearchperllucenefulltextsearch" label="elasticsearch perl lucene fulltextsearch" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/clinton_gormley/">
        <![CDATA[<p>Just released <a href="http://search.cpan.org/perldoc?Alien::ElasticSearch">Alien::ElasticSearch</a> (version 0.05 on its way to a CPAN near you).</p>

<p>This downloads, builds and installs the latest version of <a href="http://www.elasticsearch.com/">ElasticSearch</a> from GitHub, which makes a live server available for automated testing of....</p>

<p><a href="http://search.cpan.org/perldoc?ElasticSearch">ElasticSearch.pm v 0.03</a>, also on its way to a CPAN near you.</p>

<p>This version is completely rewritten to to make it easier to extend later (more like one big dispatch table), and has improved debugging, usage messages, errors etc.</p>

<p>Try: (with a server running on localhost)</p>

<pre><code>   use ElasticSearch;
   my $e = ElasticSearch-&gt;new( 
         servers  =&gt; '127.0.0.1:9200',
         trace_calls =&gt; 1,
   );

   $e-&gt;nodes;
</code></pre>

<p>... prints to STDERR:</p>

<pre><code>curl -XGET 'http://127.0.0.2:9200/_cluster/nodes' 
# {
#    "clusterName" : "elasticsearch",
#    "nodes" : {
#       "getafix-25528" : {
#          "httpAddress" : "inet[/127.0.0.2:9200]",
#          "dataNode" : true,
#          "transportAddress" : "inet[getafix.traveljury.com/127.0.
# &gt;          0.2:9300]",
#          "name" : "Miguel Espinosa"
#       }
#    }
# }
</code></pre>

<p>ie, the <code>curl</code> command which allows you rerun your requests directly from the command line, and the ElasticServer response, commented out.</p>

<p>And a test suite, which has helped to find a number of issues (now fixed) in ElasticSearch itself.</p>

<p>I'm thinking of adding an ElasticSearch::QueryBuilder to make it easier to generate the right query structure that Lucene and derivatives expect.  (See <a href="http://groups.google.com/a/elasticsearch.com/group/users/browse_thread/thread/4443e364e543bb53#">here</a>  for an example of just how convoluted the query structure can be - note, that was my first attempt at a query, not sure if it is correct or not)</p>
]]>
        

    </content>
</entry>

<entry>
    <title>How should I write a test suite which depends on an external server?</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/clinton_gormley/2010/02/how-should-i-write-a-test-suite-which-depends-on-an-external-server.html" />
    <id>tag:blogs.perl.org,2010:/users/clinton_gormley//239.287</id>

    <published>2010-02-18T12:38:55Z</published>
    <updated>2010-02-18T12:39:41Z</updated>

    <summary>I&apos;m in the process of writing a test suite for ElasticSearch.pm, but in order for it to run any tests, it requires access to an ElasticSearch cluster. Currently, I just skip all tests unless $ENV{ES_SERVER} is set, but this requires...</summary>
    <author>
        <name>Clinton Gormley</name>
        
    </author>
    
        <category term="ElasticSearch" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="perlelasticsearchcpantesting" label="Perl ElasticSearch CPAN testing" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/clinton_gormley/">
        <![CDATA[<p>I'm in the process of writing a test suite for <a href="http://search.cpan.org/search?query=elasticsearch&amp;mode=all">ElasticSearch.pm</a>, but in order for it to run any tests, it requires access to an <a href="http://www.elasticsearch.com">ElasticSearch</a> cluster.</p>

<p>Currently, I just skip all tests unless <code>$ENV{ES_SERVER}</code> is set, but this requires manual installation / testing.</p>

<p>Alternatively, I could (if <code>$ENV{ES_SERVER}</code> isn't set) try to download and compile a test version, which requires <code>git</code> and <code>java</code> v 1.6 or higher.  It doesn't take long to compile, but long enough so that  user may not want to do it by default.</p>

<p>So I could ask them if they want the script to build a test server, but again, this requires manual installation.</p>

<p>It'd be nice to use the cpan testers to run the test suite on multiple platforms, which implies building a test cluster by default.</p>

<p>What would you do?</p>
]]>
        

    </content>
</entry>

</feed>
