понедельник, 15 ноября 2021 г.

Non-formal overview of Token Bucket rate-limiting algorithm

What is the motivation to write this article about the Token Bucket algorithm?

I am the author of Bucket4j rate-limiting library, and too often I deal with fact that users want to use my library, but do not want to learn the foundations of the token-bucket algorithm and math model. 

So, to make my users to be motivated to learn about Token-Bucket I decided to write this article in a non-traditional style. Instead of being focusing on math and algorithms, this article step-by-step describes the problems that you can be faced with when implementing a rate-limiting algorithm by yourself from scratch. 

I hope that when users found from how many problems Token Bucket protects, they will become more motivated to follow and learn this Wikipedia link

Simple rate-limiter implementation

Let's implement a simple algorithm for limitation defined in terms N events per M rolling time window.
import java.util.LinkedList;

/**
* The naive solution for rate limiter which potentially leads to crash JVM with out-of memory error.
*/
public class NoneEffectiveRateLimiter {

private long availableTokens;
private final long periodMillis;

private LinkedList<Issue> issuedTokens = new LinkedList<>();

/**
* Creates instance of rate limiter which provides guarantee that consumption rate will be >= tokens/periodMillis
*/
public NoneEffectiveRateLimiter(long tokens, long periodMillis) {
this.availableTokens = tokens;
this.periodMillis = periodMillis;
}

synchronized public boolean tryConsume(int numberTokens) {
long nowMillis = System.currentTimeMillis();
clearObsoleteIssues(nowMillis);

if (availableTokens < numberTokens) {
// has no requested tokens in the bucket
return false;
} else {
issuedTokens.addLast(new Issue(numberTokens, nowMillis));
availableTokens -= numberTokens;
return true;
}
}

private void clearObsoleteIssues(long nowMillis) {
while (!issuedTokens.isEmpty()) {
Issue issue = issuedTokens.getFirst();
if (nowMillis - issue.timestampMillis > periodMillis) {
availableTokens += issue.tokens;
issuedTokens.removeFirst();
} else {
return;
}
}
}

private static final class Issue {
private final long tokens;
private final long timestampMillis;

private Issue(long tokens, long timestampMillis) {
this.tokens = tokens;
this.timestampMillis = timestampMillis;
}
}

private static final class SelfTest {

public static void main(String[] args) {
// 100 tokens per 1 second
NoneEffectiveRateLimiter limiter = new NoneEffectiveRateLimiter(100, 1000);

long startMillis = System.currentTimeMillis();
long consumed = 0;
while (System.currentTimeMillis() - startMillis < 10000) {
if (limiter.tryConsume(1)) {
consumed++;
}
}
System.out.println(consumed);
}

}

}
What are the problems with this naive implementation?
The solution above consumes memory in a non-effective way, in order to control the rate, it stores each particular fact about issued tokens at least for ```periodMillis```.
And this behavior can lead to the following problems:
  • JVM can crash with out of memory error in case of rate of events to high or the period is too large.
  • If JVM can survive, the problem of unnecessary promotion from young to tenured memory regions still exists.
Imagine that minor garbage collections happen every 5 seconds, and you have to deal with 1 minute period of the limiter. It is obvious that memory for each issued permit is primarily allocated to a new generation and then will be promoted to the tenured generation because the time to live of the permit is enough to survive several collections in the new generation. Hence this naive implementation will lead to more frequent Full GC pauses.

Attempt to optimize memory consumption

Ok, the previous attempt failed, but I can optimize memory consumption by refilling available tokens in the background thread instead of storing each fact about consumption.

Let's do it:
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;

/**
* The ineffective solution for rate limiter which wast CPU to refill tokens in background executor
*/
public class SchedulerBasedTokenBucket {

private final long maxTokens;
private long availableTokens;

/**
* Creates instance of rate limiter which provides guarantee that consumption rate will be >= tokens/periodMillis
*/
public SchedulerBasedTokenBucket(long tokens, long periodMillis, ScheduledExecutorService scheduler) {
long millisToRefillOneToken = periodMillis / tokens;
scheduler.scheduleAtFixedRate(this::addToken, periodMillis, millisToRefillOneToken, TimeUnit.MILLISECONDS);

this.maxTokens = tokens;
this.availableTokens = tokens;
}

synchronized private void addToken() {
availableTokens = Math.min(maxTokens, availableTokens + 1);
}

synchronized public boolean tryConsume(int numberTokens) {
if (availableTokens < numberTokens) {
return false;
} else {
availableTokens -= numberTokens;
return true;
}
}

private static final class SelfTest {

public static void main(String[] args) {
// 100 tokens per 1 second
ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
SchedulerBasedTokenBucket limiter = new SchedulerBasedTokenBucket(100, 1000, scheduler);

long startMillis = System.currentTimeMillis();
long consumed = 0;
while (System.currentTimeMillis() - startMillis < 10000) {
if (limiter.tryConsume(1)) {
consumed++;
}
}
scheduler.shutdown();
System.out.println(consumed);
}

}

}
What are the problems? This implementation is not self-sufficient because requires ScheduledExecutorService, and this leads to the following disadvantages:
  • Requires at least one background thread.
  • CPU can be heavily consumed in case of multiple limiters or high granular refill. For example, a limit rate of 100 events per 1 second, requires 100 executions per second inside scheduler, for a case with 100 independent limiters it needs to 10000 executions per second.
  • CPU is used for scheduling background tasks even if the limiter was unused for a long time.
  • Requires tricky management of memory and tasks: it is needed to cancel tasks in the scheduler when the limiter is not needed anymore, also references(task in scheduler holds a strong reference to limiter which makes limiter reachable).

So, what kinds of problems can be solved by the token bucket?

Token bucket algorithm solves rate-limiting problems in the way:
  • Which requires a small and predefined amount of memory, independently of the incoming request rate the memory consumed by token-bucket is always constant.
  • There are no additional background threads required by token-bucket.

Example of Token-bucket algorithm implementation

The formal description of Token Bucket can be found in this Wikipedia article. I do not want to copy this article there, but I have a major note: pay attention that wikipedia's article just describes a formal model of token-bucket, it is not the algorithm by itself. Concrete implementations of token-bucket can significantly differ from each other by details, but any implementation which called itself a "token-bucket" must produce results in a way that can be checked by this model.

Example of a pretty basic java token-bucket implementation:
/**
* The minimalistic token-bucket implementation
*/
public class MinimalisticTokenBucket {

private final long capacity;
private final double refillTokensPerOneMillis;

private double availableTokens;
private long lastRefillTimestamp;

/**
* Creates token-bucket with specified capacity and refill rate equals to refillTokens/refillPeriodMillis
*/
public MinimalisticTokenBucket(long capacity, long refillTokens, long refillPeriodMillis) {
this.capacity = capacity;
this.refillTokensPerOneMillis = (double) refillTokens / (double) refillPeriodMillis;

this.availableTokens = capacity;
this.lastRefillTimestamp = System.currentTimeMillis();
}

synchronized public boolean tryConsume(int numberTokens) {
refill();
if (availableTokens < numberTokens) {
return false;
} else {
availableTokens -= numberTokens;
return true;
}
}

private void refill() {
long currentTimeMillis = System.currentTimeMillis();
if (currentTimeMillis > lastRefillTimestamp) {
long millisSinceLastRefill = currentTimeMillis - lastRefillTimestamp;
double refill = millisSinceLastRefill * refillTokensPerOneMillis;
this.availableTokens = Math.min(capacity, availableTokens + refill);
this.lastRefillTimestamp = currentTimeMillis;
}
}

private static final class SelfTest {

public static void main(String[] args) {
// 100 tokens per 1 second
MinimalisticTokenBucket limiter = new MinimalisticTokenBucket(100, 100, 1000);

long startMillis = System.currentTimeMillis();
long consumed = 0;
while (System.currentTimeMillis() - startMillis < 10000) {
if (limiter.tryConsume(1)) {
consumed++;
}
}
System.out.println(consumed);
}

}

}
The MinimalisticTokenBucket is just a learning example that helps to understand how to implement token-bucket quickly, please do not make assumptions about how to Bucket4j internally works by looking at the code above.


Thank you for reading.