8

A Fast Car Needs Good Brakes: How We Added Client Rate Throttling to the Platfor...

 3 years ago
source link: https://blog.heroku.com/rate-throttle-api-client
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

When API requests are made one-after-the-other they'll quickly hit rate limits and when that happens:

If you provide an API client that doesn't include rate limiting, you don't really have an API client. You've got an exception generator with a remote timer.

— Richard Schneeman Stay Inside (@schneems) June 12, 2019

That tweet spawned a discussion that generated a quest to add rate throttling logic to the platform-api gem that Heroku maintains for talking to its API in Ruby.

If the term "rate throttling" is new to you, read Rate limiting, rate throttling, and how they work together

The Heroku API uses Genetic Cell Rate Algorithm (GCRA) as described by Brandur in this post on the server-side. Heroku'sAPI docs state:

The API limits the number of requests each user can make per hour to protect against abuse and buggy code. Each account has a pool of request tokens that can hold at most 4500 tokens. Each API call removes one token from the pool. Tokens are added to the account pool at a rate of roughly 75 per minute (or 4500 per hour), up to a maximum of 4500. If no tokens remain, further calls will return 429 Too Many Requests until more tokens become available.

I needed to write an algorithm that never errored as a result of a 429 response. A "simple" solution would be to add a retry to all requests when they see a 429, but that would effectively DDoS the API. I made it a goal for the rate throttling client also to minimize its retry rate. That is, if the client makes 100 requests, and 10 of them are a 429 response that its retry rate is 10%. Since the code needed to be contained entirely in the client library, it needed to be able to function without distributed coordination between multiple clients on multiple machines except for whatever information the Heroku API returned.

Making client throttling maintainable

Before we can get into what logic goes into a quality rate throttling algorithm, I want to talk about the process that I used as I think the journey is just as fascinating as the destination.

I initially started wanting to write tests for my rate throttling strategy. I quickly realized that while testing the behavior "retries a request after a 429 response," it is easy to check. I also found that checking for quality "this rate throttle strategy is better than others" could not be checked quite as easily. The solution that I came up with was to write a simulator in addition to tests. I would simulate the server's behavior, and then boot up several processes and threads and hit the simulated server with requests to observe the system's behavior.

I initially just output values to the CLI as the simulation ran, but found it challenging to make sense of them all, so I added charting. I found my simulation took too long to run and so I added a mechanism to speed up the simulated time. I used those two outputs to write what I thought was a pretty good rate throttling algorithm. The next task was wiring it up to the platform-api gem.

To help out I paired with a Heroku Engineer, Lola , we ended up making several PRs to a bunch of related projects, and that's its own story to tell. Finally, the day came where we were ready to get rate throttling into the platform-api gem; all we needed was a review.

Unfortunately, the algorithm I developed from "watching some charts for a few hours" didn't make a whole lot of sense, and it was painfully apparent that it wasn't maintainable. While I had developed a good gut feel for what a "good" algorithm did and how it behaved, I had no way of solidifying that knowledge into something that others could run with. Imagine someone in the future wants to make a change to the algorithm, and I'm no longer here. The tests I had could prevent them from breaking some expectations, but there was nothing to help them make a better algorithm.

The making of an algorithm

At this point, I could explain the approach I had taken to build an algorithm, but I had no way to quantify the "goodness" of my algorithm. That's when I decided to throw it all away and start from first principles. Instead of asking "what would make my algorithm better," I asked, "how would I know a change to my algorithm is better" and then worked to develop some ways to quantify what "better" meant. Here are the goals I ended up coming up with:

  • Minimize average retry rate: The fewer failed API requests, the better
  • Minimize maximum sleep time: Rate throttling involves waiting, and no one wants to wait for too long
  • Minimize variance of request count between clients: No one likes working with a greedy co-worker, API clients are no different. No client in the distributed system should be an extended outlier
  • Minimize time to clear a large request capacity: As the system changes, clients should respond quickly to changes.

I figured that if I could generate metrics on my rate-throttle algorithm and compare it to simpler algorithms, then I could show why individual decisions were made.

I moved my hacky scripts for my simulation into a separate repo and, rather than relying on watching charts and logs, moved to have my simulation produce numbers that could be used to quantify and compare algorithms .

With that work under my belt, I threw away everything I knew about rate-throttling and decided to use science and measurement to guide my way.

Writing a better rate-throttling algorithm with science: exponential backoff

Earlier I mentioned that a "simple" algorithm would be to retry requests. A step up in complexity and functionality would be to retry requests after an exponential backoff. I coded it up and got some numbers for a simulated 30-minute run (which takes 3 minutes of real-time):

Avg retry rate:      60.08 %
Max sleep time:      854.89 seconds
Stdev Request Count: 387.82

Time to clear workload (4500 requests, starting_sleep: 1s):
74.23 seconds

Now that we've got baseline numbers, how could we work to minimize any of these values? In my initial exponential backoff model, I multiplied sleep by a factor of 2.0, what would happen if I increased it to 3.0 or decreased it to 1.2?

To find out, I plugged in those values and re-ran my simulations. I found that there was a correlation between retry rate and max sleep value with the backoff factor, but they were inverse. I could lower the retry rate by increasing the factor (to 3.0), but this increased my maximum sleep time. I could reduce the maximum sleep time by decreasing the factor (to 1.2), but it increased my retry rate.

That experiment told me that if I wanted to optimize both retry rate and sleep time, I could not do it via only changing the exponential factor since an improvement in one meant a degradation in the other value.

At this point, we could theoretically do anything, but our metrics judge our success. We could put a cap on the maximum sleep time, for example, we could write code that says "don't sleep longer than 300 seconds", but it too would hurt the retry rate. The biggest concern for me in this example is the maximum sleep time, 854 seconds is over 14 minutes which is WAAAYY too long for a single client to be sleeping.

I ended up picking the 1.2 factor to decrease that value at the cost of a worse retry-rate:

Avg retry rate:      80.41 %
Max sleep time:      46.72 seconds
Stdev Request Count: 147.84

Time to clear workload (4500 requests, starting_sleep: 1s):
74.33 seconds

Forty-six seconds is better than 14 minutes of sleep by a long shot. How could we get the retry rate down?


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK