Better Rate Limiting in .NET
by Jack on October 17, 2010
in Software Development
Software engineers usually try to make software that does things as quickly as possible. But occasionally we need to limit the rate at which we perform some action. For example, if we’re building a web crawler, we might rein it in so it doesn’t overwhelm a server with rapid-fire HTTP requests.
Sometimes we can get away with just slowing things down by introducing a delay. Have you ever written code like this?
for (var i = 0; i < 1000; i++) { PerformAction(); Thread.Sleep(500); }
A delay between action executions is the simplest form of rate limiting, but it’s not very precise. The delay time is based on an estimate of how long the action will take to execute. If the action takes less time than estimated, then we might exceed the rate limit. Or if the action takes more time than estimated, then we might limit performance more than we need to. If the action is very complicated, we may not be able to easily estimate how long it will take. And how would this work if the action is being performed in parallel by multiple threads?
If we need a more precise rate control method, then we could try to limit the number of times we execute the action within distinct windows of time. We count the number of executions until we reach the limit, and then we don’t allow any more until the count is reset to zero at the end of the time window. For example, suppose we’re aiming for a rate limit of 2 per second. The algorithm would allow only two executions between 0.0 and 1.0 seconds, two more between 1.0 and 2.0 seconds, and so on. This algorithm is effective at controlling the average rate, but it allows for bursts up to two times the rate limit. Using the parameters above, it would allow action executions at 0.8, 0.9, 1.1, and 1.2 seconds, which is 4 occurrences in less than half a second!
Yet another possible method is to use a rate limiting algorithm like the Leaky Bucket. Fortunately these aren’t terribly difficult to implement. But, like the algorithm above, they’re really designed for average rate limiting. They usually allow the rate limit to be exceeded for short bursts.
So what would we want of a better rate limiting method? Well, we would want it to limit the number of occurrences N of an action within any time period of length M (i.e. no bursts allowed). It would allow our code to perform as close to the rate limit as possible without exceeding it. And it would be able to enforce the rate limit whether the action is performed in a single thread or by multiple threads in parallel.
The Rate Limited Pizzeria
Suppose we’re in charge of making pizzas at a pizzeria and we have a pizza oven that can hold 6 pizzas at once. The pizza maker assembles the pizzas and then places them in the oven to bake. After each pizza bakes for exactly 30 minutes, another person removes it from the oven, slices it, and boxes it for delivery.
Each pizza that we place in the oven uses one unit of the oven’s capacity for exactly 30 minutes. It’s not possible to place more than 6 pizzas in the oven in any 30 minute period. In more academic terms, a system with processing capacity N that processes each item in time M has a maximum processing rate of N/M.
Note that the baking time and oven capacity still limit the rate at which pizzas can be placed in the oven even if more than one pizza maker is making the pizzas. When the oven is full, the pizza makers will have to wait in line to place their pizzas in the oven.
Building the RateGate
Using the pizzeria model, if we can create a software object that behaves like the pizza oven and then require that we “place a pizza in the oven” before performing an action, then we effectively limit the rate at which we perform the action. We’ll call this object a RateGate because we’ll have to “pass through” it before we perform the action that we want to rate limit.
Code that requires rate control will configure a RateGate instance, specifying the rate limit by number of occurrences and duration of the time period. Before performing the action, the code will call RateGate.WaitToProceed() which will block the current thread until the action is allowed to occur. Calling WaitToProceed() is like trying to add a pizza to the oven; if the oven is full (i.e. we have hit our rate limit), then the caller will have to wait until a pizza is removed from the oven. For example, the following code calls PerformAction() 1,000 times at a rate of at most 2 times per second:
// Create a RateGate that allows 2 occurrences per second. using (var rateGate = new RateGate(2, TimeSpan.FromSeconds(1))) { for (var i = 0; i < 1000; i++) { rateGate.WaitToProceed(); PerformAction(); } }
As you can see, the RateGate is very simple to use. Let’s take a peek inside to see how it works.
The RateGate contains a counting semaphore that acts like the pizza oven. When a thread calls WaitToProceed(), it will have to wait in line until the semaphore value is less than the count (just like waiting until the number of pizzas in the oven is less than the oven’s capacity). It can then enter the semaphore (i.e. place a pizza in the oven) and proceed with the action. For our RateGate implementation in .NET 4.0, we’ll use a SemaphoreSlim as our counting semaphore. In previous versions of .NET, we could use a regular old Semaphore.
At some point we’ll need to exit the semaphore (i.e. take a pizza out of the oven). Before exiting WaitToProceed(), we compute the time when this should occur and add it to a queue. A queue is convenient because the next semaphore exit time can always be found at the front of the queue.
How do we then exit the semaphore at the time that we computed? We can’t depend on the thread that entered the semaphore to do it; that thread has moved on to other things like performing the action that we’re rate limiting. Instead we’ll take care of this in a separate management thread. Here’s how it works:
We have to ensure that our queue is thread safe because multiple threads may try to add semaphore exit times to the queue concurrently. And the management thread may also be simultaneously removing exit times. In .NET 4.0, we’ll use a ConcurrentQueue<int>, which will handle all of the synchronization internally. In previous versions of .NET, we could use a Queue<int>, but we would have to implement our own thread synchronization.
The RateGate is obviously very time dependent, so we need to be careful how we measure time. The system clock has a resolution of around 16ms, which should be more than sufficient for general rate limiting; there’s no need to use a high resolution timer. We could try to use the current system time but we can’t depend on it to be consistent, even if we avoid daylight saving time by using UTC. The system time could still be changed by a user, or it could be adjusted automatically when synchronizing with an external time source. A better clock for our purposes is System.Environment.TickCount, which gives us the number of milliseconds since the computer started. We just have to be careful because this value is stored as a 32-bit signed integer. The value starts at zero and increases for around 24.9 days until it wraps from Int32.MaxValue (2,147,483,647) to Int32.MinValue (-2,147,483,648). Then it continues increasing until it gets back to Int32.MaxValue. It’s easy enough to account for this value wrapping by using unchecked arithmetic to compute differences between time values. The downside is that we won’t be able to handle any time periods greater than around 49.8 days. But that should be okay; we’re not really looking to limit the number of occurrences over a period that long.
That covers the general idea behind the RateGate. Please check out the source code (.NET 4.0) for additional details. There are plenty of code comments that explain the “why” and “how”.
Rate Limiting a Loop
As a simple example, let’s use the code above that limits calls to PerformAction() to two per second. We’ll simulate PerformAction() by sleeping for 250ms and we’ll record its start and end times. The results with and without rate limiting are below. Notice the delay introduced by the RateGate that keeps the rate below the limit.
Executing 10 times with no rate limiting: Iteration Start Finish 1 0 249 2 250 500 3 500 750 4 751 1000 5 1000 1250 6 1250 1500 7 1501 1750 8 1750 2015 9 2016 2265 10 2266 2515 Executing 10 times with a rate limit of 2 per second: Iteration Start Finish 1 17 266 2 266 516 3 1012 1262 4 1273 1523 5 2022 2272 6 2287 2537 7 3036 3286 8 3301 3551 9 4050 4300 10 4315 4565
Rate Limiting LINQ
Another relatively simple application would be using a RateGate to limit the rate at which a LINQ query is processed through an IEnumerable
public static IEnumerable<T> LimitRate<T>(this IEnumerable<T> sequence, int items, TimeSpan timePeriod) { using (var rateGate = new RateGate(items, timePeriod)) { foreach (var item in sequence) { rateGate.WaitToProceed(); yield return item; } } }
We can then use this extension method to limit the rate at which a sequence is enumerated:
var stopwatch = new Stopwatch(); var sequence = Enumerable.Range(1, 20); stopwatch.Start(); // Limit the rate of processing the sequence to 5 items per second. foreach (var number in sequence.LimitRate(5, TimeSpan.FromSeconds(1))) Console.WriteLine("{0,4} {1,6}", number, stopwatch.ElapsedMilliseconds);
Enumerating sequence with rate limit of 5 items per second: 1 14 2 33 3 34 4 34 5 34 6 1026 7 1029 8 1029 9 1029 10 1029 11 2040 12 2040 13 2040 14 2040 15 2040 16 3058 17 3058 18 3058 19 3058 20 3059
Be careful, though, if using this extension method in a PLINQ query. PLINQ can use a chunking strategy in which it pulls a chunk of items from the sequence before distributing the items for parallel processing. In the code below, suppose PLINQ starts by pulling a chunk of 50 items from the sequence. That part of the processing will be rate limited, taking around 10 seconds to complete. Then PLINQ will start calling PerformAction() for the items that it pulled from the sequence. The calls to PerformAction() will likely be executed in multiple threads, and without any rate limiting at all!
// Warning: Might not work as expected! using (var rateGate = new RateGate(5, TimeSpan.FromSeconds(1))) { Enumerable.Range(1, 1000) .LimitRate(rateGate) .AsParallel() .ForAll(x => PerformAction(x)); }
Rate Limiting and Concurrency
As a final example, let’s use a RateGate to rate limit across multiple threads. We’ll use the Task Parallel Library to execute the rate limited action:
// Create a RateGate that allows 2 occurrences per second. using (var rateGate = new RateGate(2, TimeSpan.FromSeconds(1))) { Parallel.For(0, 10, i => { rateGate.WaitToProceed(); PerformAction(); }); }
The TPL performs the processing out of sequence and introduces some delays, but the RateGate still limits the rate at which the items are processed.
Executing 10 times with no rate limiting: Iteration Start Finish 1 50 299 6 61 311 2 66 316 3 300 550 7 317 567 5 317 567 4 550 800 8 567 817 9 568 817 10 817 1067 Executing 10 times with a rate limit of 2 per second: Iteration Start Finish 6 5 255 1 7 257 2 1980 2229 7 1980 2230 3 2987 3237 10 2987 3237 5 4001 4251 4 4001 4251 8 5015 5265 9 5266 5515
Download the Code
I hope you find the RateGate interesting and/or useful! You can grab the .NET 4.0 source code here. Please leave a comment if you have suggestions for improvements or if you find a neat use for it. Cheers!
Tagged as: .NET, .NET 4.0, C#, LINQ, Multithreading
Previous post: LINQ Multicasting in .NET 4.0
{ 3 comments… read them below or add one }
Great article Jack!
At some point you are saying: “The downside is that we won’t be able to handle any time periods greater than around 49.8 days. But that should be okay; we’re not really looking to limit the number of occurrences over a period that long.”
Could you explain a little bit what is the actual behaviour? What will happen if you specify a time period over 49.8 days in RateGate constructor?
Cheers,
Robert
@Robert: I chose to have the RateGate constructor throw an ArgumentOutOfRangeException when the time period is greater than 2^32 milliseconds to avoid the behavior entirely, but here’s why:
Environment.TickCount is a 32-bit (signed) integer. So the difference between any two Environment.TickCount values is at most (2^32 – 1) milliseconds, which is about 49.8 days. So that’s the maximum amount of time difference that we can measure.
We can try to measure a greater time difference, but the result won’t be correct. For example, the TickCount when the computer starts is zero. Exactly 50 days later, 4,320,000,000 milliseconds will have passed. The TickCount value at that time will be 21,600 (because it returns to zero after 2^32-1 ms). If we take the difference in TickCount values (21,600 – 0) to compute the elapsed time, the result is only 6 hours.
How safe do you think it would be to create multiple copies of this for different tenants in a multiternanted environment? Each client gets its own rate limiting. Do you see drawbacks to initializing one of these for each client on a server?