A distributed design challenge
I have a distributed design problem that I thought Monks might find interesting and perhaps be willing to offer suggestions on.
I'm writing real time bidding software. It works like this:
- My software receives a request to bid on an auction
- I return a bid on said request (I must respond in 85 to 100 milliseconds)
- Later, I receive a notification on whether or not I won the auction
Point number 3 above means I don't find out if I've won a request until shortly after I've made the request.
Currently, on a server farm of about 36 boxes, we're receiving roughly 400 bid requests a second, or about 11 requests/second per box. This is an extremely light load and will likely increase by one or two orders of magnitude when the system goes live.
We have various campaigns, each with individual daily and total budgets that cannot be overspent (a bit of wiggle room is allowed). The servers run separate databases and the only shared data source we have is a Redis server with two slaves, though we could implement others.
So that's the setup. Now on to the problem.
(The monetary numbers are fictional)
Several times we've been close to our budget and we've had 16 servers respond simultaneously with winning bids, pushing us further over our budget than we would like. We keep track of our spending in Redis, so if all 36 servers respond to a bid at the same time, they might see that they've spent $99 (at a budget of $100), but not realize that all of them winning will push us to $135, or 35% over our limit.
In other words, it's impossible for the servers to know that other servers are bidding at the same time, or whether or not they'll win.
One naive strategy is to have each server register with Redis and receive a unique, sequential number. Each server, when receiving a bid request, will do a modulus operation against a time segment to decide if it should bid or not. This requires that all 36 servers have their time synchronized (not too big of a deal), but effectively reduces our throughput capacity to 1/36th of what we would like.
Further, we have multiple campaigns. We might want to have server/campaign combinations, so if server 1 can bid on campaign A right now, then maybe server 2 is bidding on campaign B, thus ensuring that servers are not running idle (I can't tell you the number of campaigns, but there are many).
But then what happens if we get an important, but low-volume campaign where we get a bid request every five minutes or so? It's very conceivable that it will rarely, if ever, hit the server which is allowed to "handle" that campaign at the time. Further, calculating how often a particular campaign comes in is very hard (perhaps not NP hard, but still hard).
Does anyone have experience here? Or do you have thoughts on how to approach this? Writing everything to a central RDBMS could cause serious contention issues as we'd use transactions to lock critical data. Any solution that increases response times by more than a few milliseconds is likely to be problematic.