Hash::ConstantHash

By Luben Karavelov on April 17, 2011 3:45 AM under cpan, perl

Some time ago I searched for constant hash algorithm at CPAN and found nothing. So I wrote one. Here it is - distribution can be found on CPAN, development tree is on github.

The algorithm is well-known and shared by most of constant hash implementations. At Last.fm they have written a C library called ketama. The algorithm I have implemented is the same but written in pure perl with only one dependency - String::CRC32.

Here are some scenarios I have used it in mass-hosting environment project:

Memcached redundancy and keys distribution

We used some number of memcached servers but we wanted the data that we put in them to survive single machine reboot. We also wanted when we add or remove memcached servers only small fraction of the keys to be reassigned to new servers. So I have written a minimal Cache::Memcached frontend (ping me if you are interested, I will publish it) that stores the key-value pairs in N redundant servers (usually N=2) and when looks up a key it tries these N servers until it finds the key. For key distribution it uses Hash::ConstantHash.

This brings one of the features of the module that is not common. If we have configured M hash buckets (servers in memcached example) , when looking up a key it will return non-repeating answers until all buckets are exhausted - this guarantees that it we want N-way redundancy we actually get N-way redundancy, not N-1.

Mailbox distribution

Our mail servers used a shared storage for all the mailboxes, actually every box could be accessed from every server but this was not optimal because of the locks and filesystem caches. So we used this module in load-balancing IMAP/POP3 access with nginx (great software by the way that have just reached version 1.0.0). This gave us some some nice properties - every mailbox is accessed from only one server, if we bring down one server for maintenance, only small fraction of mailbox access is reassigned to other severs.

I am sure it will find other uses, these are just examples where the code was used in production. Though it was field tested there could be bugs - feel free to file a bug report or ask for features.

6 comments

Tagged as:

hash constant

6 Comments

rhesa | April 17, 2011 8:50 AM | Reply

I think the reason you didn't find anything on CPAN is that this is usually called "consistent hashing". There are at least two on CPAN that I know of: Set::ConsistentHash, Algorithm::ConsistentHash::Ketama. Furthermore, libmemcached has a couple of consistent hashing algorithms baked in, and I believe Cache::Memcached::Fast has its own internal version.

Luben Karavelov | April 17, 2011 4:51 PM | Reply

Ouch, you are right. It's a shame I have not found Set::ConsistentHash module.

On the other side, I was aware that memcached libraries had such algorithms baked in but it was only used for key distribution, I have not found a way to make them store a key in number of redundant servers.

A quick benchmark I have made now shows that Set::ConsistentHash is 10 times faster than my module when looking for only one bucket(target) for given key. But when looking for more than one bucket(target) my module is more than 1000 times faster.

May be I should rename it to Hash::ConsistentHash and re-release it.

Mithaldu | April 17, 2011 5:57 PM | Reply

Your synopsis could be improved to demonstrate an actually working example that people can copy/paste into a .pl file and run it to see what happens. :)

Luben Karavelov | April 17, 2011 8:32 PM | Reply

Thanks for the suggestions, I have uploaded new distribution to github and CPAN. The module is renamed to Hash::ConsistentHash, examples in the documentation are updated - now you could just copy/paste them in .pl file.

One minor change to the API is also made - instead of hard wiring crc32 hash function, it is now supplied by the user.

targ.myopenid.com | April 17, 2011 10:53 PM | Reply

You should also mention in the docs what sets your module apart from the other two mentioned above by rhesa.

Luben Karavelov | April 18, 2011 3:09 AM | Reply

Yes, you are right, I have added a paragraph with specifics of the module.

Also I have reworked slightly the algorithm in order to expose get_bucket method with the same semantics and performance as in Set::ConsistentHash.

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About Luben Karavelov

I blog about Perl.

More info »

Luben Karavelov