Lock me away
I decided from the beginning that I wanted a well-defined API for the storage layer, so that as different back ends would be easy to write and swap around. While writing the tests for the first one, I encountered a problem that’s been haunting me since: how to deal with locks.
Originally, I imagined a job would either be locked, or not locked and manager processes would be in charge of releasing locks if workers died / failed. Then I started thinking, what if a manager process dies? Who releases the locks? So I decided to add time-based automatic lock expiration. This introduced a new problem, a possible race condition. What if a job takes so long that it’s lock times out before the job is done? What do I do then?
A possibility is to have a manager process / thread that monitors its workers and updates the lock-expiry time, which is the route I am likely to take. I don’t particularly like it, but I can’t come up with anything better right now.
I’d love to hear from anyone that has any experience with this kind of problem.