Hash::ConstantHash

Some time ago I searched for constant hash algorithm at CPAN and found nothing. So I wrote one. Here it is - distribution can be found on CPAN, development tree is on github.

The algorithm is well-known and shared by most of constant hash implementations. At Last.fm they have written a C library called ketama. The algorithm I have implemented is the same but written in pure perl with only one dependency - String::CRC32.

Here are some scenarios I have used it in mass-hosting environment project:

Memcached redundancy and keys distribution

We used some number of memcached servers but we wanted the data that we put in them to survive single machine reboot. We also wanted when we add or remove memcached servers only small fraction of the keys to be reassigned to new servers. So I have written a minimal Cache::Memcached frontend (ping me if you are interested, I will publish it) that stores the key-value pairs in N redundant servers (usually N=2) and when looks up a key it tries these N servers until it finds the key. For key distribution it uses Hash::ConstantHash.

This brings one of the features of the module that is not common. If we have configured M hash buckets (servers in memcached example) , when looking up a key it will return non-repeating answers until all buckets are exhausted - this guarantees that it we want N-way redundancy we actually get N-way redundancy, not N-1.

Mailbox distribution

Our mail servers used a shared storage for all the mailboxes, actually every box could be accessed from every server but this was not optimal because of the locks and filesystem caches. So we used this module in load-balancing IMAP/POP3 access with nginx (great software by the way that have just reached version 1.0.0). This gave us some some nice properties - every mailbox is accessed from only one server, if we bring down one server for maintenance, only small fraction of mailbox access is reassigned to other severs.

I am sure it will find other uses, these are just examples where the code was used in production. Though it was field tested there could be bugs - feel free to file a bug report or ask for features.

6 Comments

I think the reason you didn't find anything on CPAN is that this is usually called "consistent hashing". There are at least two on CPAN that I know of: Set::ConsistentHash, Algorithm::ConsistentHash::Ketama. Furthermore, libmemcached has a couple of consistent hashing algorithms baked in, and I believe Cache::Memcached::Fast has its own internal version.

Your synopsis could be improved to demonstrate an actually working example that people can copy/paste into a .pl file and run it to see what happens. :)

You should also mention in the docs what sets your module apart from the other two mentioned above by rhesa.

Leave a comment

About Luben Karavelov

user-pic I blog about Perl.