Better Rate Limiting in .NET

by Jack on October 17, 2010

in Software Development

Software engineers usually try to make software that does things as quickly as possible. But occasionally we need to limit the rate at which we perform some action. For example, if we’re building a web crawler, we might rein it in so it doesn’t overwhelm a server with rapid-fire HTTP requests.

Sometimes we can get away with just slowing things down by introducing a delay. Have you ever written code like this?

for (var i = 0; i < 1000; i++)
{
    PerformAction();
    Thread.Sleep(500);
}

A delay between action executions is the simplest form of rate limiting, but it’s not very precise. The delay time is based on an estimate of how long the action will take to execute. If the action takes less time than estimated, then we might exceed the rate limit. Or if the action takes more time than estimated, then we might limit performance more than we need to. If the action is very complicated, we may not be able to easily estimate how long it will take. And how would this work if the action is being performed in parallel by multiple threads?

If we need a more precise rate control method, then we could try to limit the number of times we execute the action within distinct windows of time. We count the number of executions until we reach the limit, and then we don’t allow any more until the count is reset to zero at the end of the time window. For example, suppose we’re aiming for a rate limit of 2 per second. The algorithm would allow only two executions between 0.0 and 1.0 seconds, two more between 1.0 and 2.0 seconds, and so on. This algorithm is effective at controlling the average rate, but it allows for bursts up to two times the rate limit. Using the parameters above, it would allow action executions at 0.8, 0.9, 1.1, and 1.2 seconds, which is 4 occurrences in less than half a second!

Yet another possible method is to use a rate limiting algorithm like the Leaky Bucket. Fortunately these aren’t terribly difficult to implement. But, like the algorithm above, they’re really designed for average rate limiting. They usually allow the rate limit to be exceeded for short bursts.

So what would we want of a better rate limiting method? Well, we would want it to limit the number of occurrences N of an action within any time period of length M (i.e. no bursts allowed). It would allow our code to perform as close to the rate limit as possible without exceeding it. And it would be able to enforce the rate limit whether the action is performed in a single thread or by multiple threads in parallel.

The Rate Limited Pizzeria

Suppose we’re in charge of making pizzas at a pizzeria and we have a pizza oven that can hold 6 pizzas at once. The pizza maker assembles the pizzas and then places them in the oven to bake. After each pizza bakes for exactly 30 minutes, another person removes it from the oven, slices it, and boxes it for delivery.

Each pizza that we place in the oven uses one unit of the oven’s capacity for exactly 30 minutes. It’s not possible to place more than 6 pizzas in the oven in any 30 minute period. In more academic terms, a system with processing capacity N that processes each item in time M has a maximum processing rate of N/M.

Note that the baking time and oven capacity still limit the rate at which pizzas can be placed in the oven even if more than one pizza maker is making the pizzas. When the oven is full, the pizza makers will have to wait in line to place their pizzas in the oven.

Building the RateGate

Using the pizzeria model, if we can create a software object that behaves like the pizza oven and then require that we “place a pizza in the oven” before performing an action, then we effectively limit the rate at which we perform the action. We’ll call this object a RateGate because we’ll have to “pass through” it before we perform the action that we want to rate limit.

Code that requires rate control will configure a RateGate instance, specifying the rate limit by number of occurrences and duration of the time period. Before performing the action, the code will call RateGate.WaitToProceed() which will block the current thread until the action is allowed to occur. Calling WaitToProceed() is like trying to add a pizza to the oven; if the oven is full (i.e. we have hit our rate limit), then the caller will have to wait until a pizza is removed from the oven. For example, the following code calls PerformAction() 1,000 times at a rate of at most 2 times per second:

// Create a RateGate that allows 2 occurrences per second.
using (var rateGate = new RateGate(2, TimeSpan.FromSeconds(1)))
{
    for (var i = 0; i < 1000; i++)
    {
        rateGate.WaitToProceed();
        PerformAction();
    }
}

As you can see, the RateGate is very simple to use. Let’s take a peek inside to see how it works.

The RateGate contains a counting semaphore that acts like the pizza oven. When a thread calls WaitToProceed(), it will have to wait in line until the semaphore value is less than the count (just like waiting until the number of pizzas in the oven is less than the oven’s capacity). It can then enter the semaphore (i.e. place a pizza in the oven) and proceed with the action. For our RateGate implementation in .NET 4.0, we’ll use a SemaphoreSlim as our counting semaphore. In previous versions of .NET, we could use a regular old Semaphore.

At some point we’ll need to exit the semaphore (i.e. take a pizza out of the oven). Before exiting WaitToProceed(), we compute the time when this should occur and add it to a queue. A queue is convenient because the next semaphore exit time can always be found at the front of the queue.

How do we then exit the semaphore at the time that we computed? We can’t depend on the thread that entered the semaphore to do it; that thread has moved on to other things like performing the action that we’re rate limiting. Instead we’ll take care of this in a separate management thread. Here’s how it works:

We have to ensure that our queue is thread safe because multiple threads may try to add semaphore exit times to the queue concurrently. And the management thread may also be simultaneously removing exit times. In .NET 4.0, we’ll use a ConcurrentQueue<int>, which will handle all of the synchronization internally. In previous versions of .NET, we could use a Queue<int>, but we would have to implement our own thread synchronization.

The RateGate is obviously very time dependent, so we need to be careful how we measure time. The system clock has a resolution of around 16ms, which should be more than sufficient for general rate limiting; there’s no need to use a high resolution timer. We could try to use the current system time but we can’t depend on it to be consistent, even if we avoid daylight saving time by using UTC. The system time could still be changed by a user, or it could be adjusted automatically when synchronizing with an external time source. A better clock for our purposes is System.Environment.TickCount, which gives us the number of milliseconds since the computer started. We just have to be careful because this value is stored as a 32-bit signed integer. The value starts at zero and increases for around 24.9 days until it wraps from Int32.MaxValue (2,147,483,647) to Int32.MinValue (-2,147,483,648). Then it continues increasing until it gets back to Int32.MaxValue. It’s easy enough to account for this value wrapping by using unchecked arithmetic to compute differences between time values. The downside is that we won’t be able to handle any time periods greater than around 49.8 days. But that should be okay; we’re not really looking to limit the number of occurrences over a period that long.

That covers the general idea behind the RateGate. Please check out the source code (.NET 4.0) for additional details. There are plenty of code comments that explain the “why” and “how”.

Rate Limiting a Loop

As a simple example, let’s use the code above that limits calls to PerformAction() to two per second. We’ll simulate PerformAction() by sleeping for 250ms and we’ll record its start and end times. The results with and without rate limiting are below. Notice the delay introduced by the RateGate that keeps the rate below the limit.

Executing 10 times with no rate limiting:
Iteration   Start   Finish
       1       0      249
       2     250      500
       3     500      750
       4     751     1000
       5    1000     1250
       6    1250     1500
       7    1501     1750
       8    1750     2015
       9    2016     2265
      10    2266     2515

Executing 10 times with a rate limit of 2 per second:
Iteration   Start   Finish
       1      17      266
       2     266      516
       3    1012     1262
       4    1273     1523
       5    2022     2272
       6    2287     2537
       7    3036     3286
       8    3301     3551
       9    4050     4300
      10    4315     4565

Rate Limiting LINQ

Another relatively simple application would be using a RateGate to limit the rate at which a LINQ query is processed through an IEnumerable extension method:

public static IEnumerable<T> LimitRate<T>(this IEnumerable<T> sequence, int items, TimeSpan timePeriod)
{
    using (var rateGate = new RateGate(items, timePeriod))
    {
        foreach (var item in sequence)
        {
            rateGate.WaitToProceed();
            yield return item;
        }
    }
}

We can then use this extension method to limit the rate at which a sequence is enumerated:

var stopwatch = new Stopwatch();
var sequence = Enumerable.Range(1, 20);
stopwatch.Start();
// Limit the rate of processing the sequence to 5 items per second.
foreach (var number in sequence.LimitRate(5, TimeSpan.FromSeconds(1)))
    Console.WriteLine("{0,4} {1,6}", number, stopwatch.ElapsedMilliseconds);

Enumerating sequence with rate limit of 5 items per second:
   1     14
   2     33
   3     34
   4     34
   5     34
   6   1026
   7   1029
   8   1029
   9   1029
  10   1029
  11   2040
  12   2040
  13   2040
  14   2040
  15   2040
  16   3058
  17   3058
  18   3058
  19   3058
  20   3059

Be careful, though, if using this extension method in a PLINQ query. PLINQ can use a chunking strategy in which it pulls a chunk of items from the sequence before distributing the items for parallel processing. In the code below, suppose PLINQ starts by pulling a chunk of 50 items from the sequence. That part of the processing will be rate limited, taking around 10 seconds to complete. Then PLINQ will start calling PerformAction() for the items that it pulled from the sequence. The calls to PerformAction() will likely be executed in multiple threads, and without any rate limiting at all!

// Warning: Might not work as expected!
using (var rateGate = new RateGate(5, TimeSpan.FromSeconds(1)))
{
    Enumerable.Range(1, 1000)
        .LimitRate(rateGate)
        .AsParallel()
        .ForAll(x => PerformAction(x));
}

Rate Limiting and Concurrency

As a final example, let’s use a RateGate to rate limit across multiple threads. We’ll use the Task Parallel Library to execute the rate limited action:

// Create a RateGate that allows 2 occurrences per second.
using (var rateGate = new RateGate(2, TimeSpan.FromSeconds(1)))
{
    Parallel.For(0, 10, i =>
                            {
                                rateGate.WaitToProceed();
                                PerformAction();
                            });
}

The TPL performs the processing out of sequence and introduces some delays, but the RateGate still limits the rate at which the items are processed.

Executing 10 times with no rate limiting:
Iteration   Start   Finish
       1      50      299
       6      61      311
       2      66      316
       3     300      550
       7     317      567
       5     317      567
       4     550      800
       8     567      817
       9     568      817
      10     817     1067

Executing 10 times with a rate limit of 2 per second:
Iteration   Start   Finish
       6       5      255
       1       7      257
       2    1980     2229
       7    1980     2230
       3    2987     3237
      10    2987     3237
       5    4001     4251
       4    4001     4251
       8    5015     5265
       9    5266     5515

Download the Code

I hope you find the RateGate interesting and/or useful! You can grab the .NET 4.0 source code here. Please leave a comment if you have suggestions for improvements or if you find a neat use for it. Cheers!

Tagged as: .NET, .NET 4.0, C#, LINQ, Multithreading

{ 3 comments… read them below or add one }

Robert October 18, 2010 at 4:24 pm: Great article Jack!
At some point you are saying: “The downside is that we won’t be able to handle any time periods greater than around 49.8 days. But that should be okay; we’re not really looking to limit the number of occurrences over a period that long.”

Could you explain a little bit what is the actual behaviour? What will happen if you specify a time period over 49.8 days in RateGate constructor?

Cheers,
Robert
Jack October 19, 2010 at 9:31 am: @Robert: I chose to have the RateGate constructor throw an ArgumentOutOfRangeException when the time period is greater than 2^32 milliseconds to avoid the behavior entirely, but here’s why:

Environment.TickCount is a 32-bit (signed) integer. So the difference between any two Environment.TickCount values is at most (2^32 – 1) milliseconds, which is about 49.8 days. So that’s the maximum amount of time difference that we can measure.

We can try to measure a greater time difference, but the result won’t be correct. For example, the TickCount when the computer starts is zero. Exactly 50 days later, 4,320,000,000 milliseconds will have passed. The TickCount value at that time will be 21,600 (because it returns to zero after 2^32-1 ms). If we take the difference in TickCount values (21,600 – 0) to compute the elapsed time, the result is only 6 hours.
George November 21, 2011 at 12:22 am: How safe do you think it would be to create multiple copies of this for different tenants in a multiternanted environment? Each client gets its own rate limiting. Do you see drawbacks to initializing one of these for each client on a server?