leaking

A rate limiter is used to control the rate of traffic send by a client or a service. rate limit refers to preventing the frequency of an operation from exceeding some constraint.

Introduction

I’ve gathered some insightful concepts from the book “System Design Interview,” along with valuable resources on rate-limiting strategies and techniques.

These include “Rate-limiting strategies and techniques” from Google Cloud Architecture, an engaging article called “An alternative approach to rate limiting” on Figma’s blog, and an informative guide on rate limits titled “Rate limits: Standard v1.1 Twitter.”

These materials offer fresh perspectives and practical advice for optimizing system performance while ensuring fair usage. Dive into these captivating resources to discover effective ways to tackle rate-limiting challenges in your designs.

Reasons

to put in place a defensive measure for services (this is quite essential for large companies)
protect services from excessive use
put in place a “fair use policy”
Prevent resource starvation
reduce cost

Requirements

accurate limit excessive request
low latency, the limiter should not slow down the service
use little memory
distributed
exception handling
high fault tolerance

Token bucket

Each request utilizes one token.
Tokens are replenished at regular intervals of time (every n amount).
In case there are no tokens available:
- The request can be stored for later handling.
- Alternatively, the request can be dropped.

tokenbucket

Variables (Token bucket)

Bucket size
Refill rate

Leaky bucket

This algorithm utilizes a queue (FIFO) to manage requests. If the bucket is already at its maximum capacity, any new requests will be ignored.

The process involves adding new requests to the queue (bucket). If the queue is full, there are two options: the request can either be dropped or saved for later processing.

At regular intervals, requests are pulled from the queue and processed.

leakingbucket

Pros (Leaky bucket)

Memory efficient, limited queue size
Request are processed at a fixed rate

Cons (Leaky bucket)

In the event of a burst of traffic, new requests can merge with older requests if the latter have not been processed in a timely manner.

Variables (Leaky bucket)

The bucket size refers to the maximum capacity of the bucket, indicating the number of requests it can hold at any given time.
The overflow rate determines the number of requests that can be processed within a fixed rate or time interval. It represents the rate at which the bucket can handle incoming requests without overflowing.

Fixed window counter

The algorithm partitions the timeline into time windows of fixed size.
For each request, the counter corresponding to each window is incremented.

fixedwindow1

Time edges can be a challenge.
This can result in the possibility of having twice as many allowed requests.

fixedwindow2

Pros (fixed window)

Memory efficient
Fixed counter per n amount of time can fit some use cases
Easy to understand

Variables (fixed window)

Counter per window
Window time

Sliding window log

keep track of the timestamps (request)
when a new request comes in remove all outdated timestamps, those are defined to be older than the current time window
add timestamp of the new request to the log
if the log size is bigger than the max allowed drop / save else handle normally

slidingwindowlog

Pros (sliding window log)

The algorithm is very accurate

Cons (sliding window log)

Memory hard

Variables (sliding window log)

Request per minute

Sliding window

Similar to the fixed window approach, this algorithm employs a timer and a counter to manage the maximum number of available requests per user. However, in this case, the timer begins counting only after the first request is made.

This algorithm combines the benefits of both the bucket and fixed window approaches. Unlike the previous methods, the sliding window implementation ensures that newer requests are not starved, and the system is not overwhelmed by bursts of requests.

slidingwindow

Pros (sliding window)

Smooths out spikes by basing the current rate on the average rate of the previous window.
It is memory efficient.

Variables (sliding window)

Request per minute
Prev minute
Prev percentage
Current minute
Current time
Current percentage

References

system design interview Rate-limiting strategies and techniques, An alternative approach to rate limiting & Rate limits: Standard v1.1 twitter