Elasticsearch Custom Scoring
Elasticsearch has a builtin scoring algorithm which works quite well in practice, but sometimes you want to roll your own scoring algorithm. Let's examine how to create a custom scoring algorithm using the function score query.
Let's assume we want to search and score Perl job offers based on a set of weighted keywords. To get the score of each offer, we'll multiply the weights of matching keywords where positive keywords have weights greater than one, and negative keywords are weighted less than one. Thus, the positive and negative keyword matches add and take away from the final product (score) respectively.
Define the Qualitative Importance and Their Weights
Qualitative Tags
Instead of assigning weights directly to keywords, let's use qualitative tags of desirability (importance) for each keyword (trait). Borrowing from okcupid's five degree scale of importance:
- mandatory
- very
- somewhat
- little
- irrelevant
This takes care of the non-negative traits, now let's add qualifiers for negative traits using a similar pattern:
- mandatory_neg
- very_neg
- somewhat_neg
- little_neg
Actual Weights
Recall that we multiply the individual weights, thus the number one acts as an irrelevant trait since it contributes nothing to the final product. In this example, I have chosen the following weights somewhat influenced by the okcupid example:
my %scale = ( irrelevant => 1, little => 1.5, somewhat => 3, very => 20, mandatory => 100, little_neg => 0.666, somewhat_neg => 0.5, very_neg => 0.05, mandatory_neg => 0, );
Notice that I've chosen to weight a mandatory_neg keyword as zero, and thus any matching offer will be scored as zero, putting the offer at the bottom of the list. On the other hand, notice that mandatory does not force a matching offer to the top, but it does multiply the running score by 100 which would likely move the offer up in the rankings. Perhaps mandatory would be better named extremely...
Qualitative Keywords
Let's use the following keyword profile to define what we seek in a job offer:
my %profile = ( linux => 'mandatory', telecommute => 'mandatory', elasticsearch => 'mandatory', moose => 'very', test => 'very', benefits => 'very', math => 'somewhat', sql => 'somewhat', agile => 'somewhat', subversion => 'little_neg', soap => 'somewhat_neg', cvs => 'very_neg', microsoft => 'very_neg', brogrammer => 'mandatory_neg', );
Code
In our example, the function score query takes the following form:
query => { function_score => { query => {match_all => {{}}, boost_mode => 'replace', score_mode => 'multiply', functions => $functions, } }
where the $functions are defined as:
Functions
my $functions = [ map { { filter => { bool => { should => [ { terms => { 'title' => [$_] } }, { terms => { 'description' => [$_] } }, ] } }, boost_factor => $scale{ $profile{$_} }, }, } keys %profile ];
Here, we define a filter for each keyword that searches in the title or description and weights a matching offer with a boost_factor taken from the scale value we chose for the profile keyword. Notice that we use the score_mode of multiply which means the final score will be a product of the matching filter weights. In addition, we are replacing any builtin score that elasticsearch generates with our custom score.
Search Parameters
Let's assume we are using the elasticsearch perl client, Search::Elasticsearch and that we have our perl job offers in the elasticsearch index jobs and of type perl. Then our search parameters look like:
my %search_parameters = ( index => 'jobs', type => 'perl', body => { query => { function_score => { query => $query, boost_mode => 'replace', score_mode => 'multiply', functions => $functions, } } } ); my $es = Search::Elasticsearch->new; my $matches = $es->search(%search_params)->{hits}->{hits};
Given the $matches, we can output the matching offers, ordered by score, as so:
foreach my $job (@{$matches}) { my $fields = $job->{_source}; say "Title: ", $fields->{'title'}; say "URL: ", $fields->{url}; say "Score: ", $job->{_score}; }
If I load the offers and search using this profile then a couple of jobs surface to the top, one of which is with craigslist even though it's not listed as telecommute. The high score means it still has a lot of other desirable traits. Perhaps I should give them a call to see if we can work something out :)
Leave a comment