This entry will be a bit tech heavy, but some of our customers enjoy hearing of our technical progress, even if it is behind the scenes. And yes, it’s good for everyone to know we’re always working here.
Our backend software stack - which supports our API - consists of around 50 independent processes. Some of these processes are clones; duplicates of a program running at the same time. This pattern of running duplicates improves the resiliency of our API; if any one process has a problem or crashes, the others remain up and operating and handling any animation creation work that needs to be done. The administrator receives a notification of the crashed process, and debugging can start right away. A replacement process is also started, but there’s never any interruption in service, since multiple other processes are already integrated and actively picking up work.
The throughput of the API stack is increased as well, since multiple processes can work in parallel on different API requests. Scalability arrives too: if a larger volume of API calls needs to be handled, we simply increase the number of cloned processes.
This software design pattern is known as Competitive Consumers, and here is a good summary article.
Our stack has always had a competitive consumer design component which works well, but in the past was not optimized for time separation of work requests. Each process is repeatedly checking the work queue database for any available work. There must be a time delay between each work request, otherwise there’s too much overhead and stress on both network and cpu because each process would be ‘hammering’ the system asking for work.
So, if there must be a time delay between requests, initially, a solution as simple as a constant length sleep statement between work requests can serve to balance the need for:
a low system load, and also
a rapid turnaround for processing work requests.
This fixed sleep time approach works, but it also has a negative impact when scaling is needed and the running count of clones is increased.
Let’s examine the effects of a simple case. Say initially there are 2 processes competing for work, hard coded to sleep 1 second between work requests.
The queue server will be getting an average of around 2 requests per second.
These 2 processes could be started exactly 0.5 seconds from each other, which would optimize the minimal time between work request pickup and alternating process requests hitting the queue server to minimize stress on the queue server.
When perfectly timing and spaced, the queue server would be getting one work request every 0.5 seconds, one from each process, repeating a work query to the database every 1 second.
But we know in the real world computer processing isn’t perfect. There will be a constant time-skewing of requests between calls as driven by natural random delays on the system and the randomness of how much time each work request takes. This will result in ‘request pile ups’ where both processes could eventually be requesting work at the same time, inefficiently hammering the queue server in bursts, and also increasing the maximum wait time to almost 1 second.
So let’s ignore the ‘time skew’ request pile up problem for just a moment, and focus on the simpler problem of hard coded sleep statement’s effect on scaling.
If we scale up the process count by adding 8 processes for a total of 10, also sleeping a hard coded 0.5 seconds between work requests, then the queue server will have to handle 5x as much queue requests.
This is reasonably solved by calculating the sleep length as a function of the number of processes running, so the more processes that are running, the longer each process waits between calls.
Now imagine 10 processes with the above sleep adjustment, each sleeping 5 seconds before querying for work, with the queue server getting a request every 0.5 seconds.
… but the skewing-over-time issue remains. With 10 processes running, they could frequently time skew into a pattern where they were all hammering the queue server at the same time with a large time gap of as much as almost 5 seconds between any specific work request getting picked up from the queue server.
Not optimal!
We decided on a solution where we declared a desired time separation between requests arriving at the queue server. So say we want to declare that the queue server should queried once every 0.25 seconds. Remember, the goal here is to go lightweight on the queue server, and go light on the network, but still get work done in a reasonable amount of time.
If we know we want the queue server to be hit every 0.25 seconds, and we also know
The number of clones
The index of the clone making the calculation
A standard offset for each process group type
The current time when the process has completed work or asking if there is work
Then we can calculate a specific time to sleep such that the next request falls on a specific time slot within a one minute period, such that each process separates its call to the queue server in a consistent manner, with no time skew, even if the time needed for work requests varies significantly.
Rather than our sleep statements being hard coded, they are dynamically determined, adjusting for system time skew, the variable time needed to get work requests done, the number of clones running, and the type of clone running.
This was a fun implementation to make, and we saw the desired effect: much more consistent low cpu/network utilization, and also much more consistent and reduced API request times for completion.
Frequently in software engineering it’s best to ‘get things working correctly’ first, and then circle back later and optimize. It depends on the project needs of course, but there are definitely times when too much focus can be made on coming up with the perfect optimized code first, only delaying getting something useful in the user’s hands. It’s always particularly satisfying to implement a significant performance optimization on a system that has been working for years!