skip to content
tamagoyaki logo

tamagoyaki

leaking

A rate limiter is used to control the rate of traffic send by a client or a service. rate limit refers to preventing the frequency of an operation from exceeding some constraint.

Introduction

I’ve gathered some insightful concepts from the book “System Design Interview,” along with valuable resources on rate-limiting strategies and techniques.

These include “Rate-limiting strategies and techniques” from Google Cloud Architecture, an engaging article called “An alternative approach to rate limiting” on Figma’s blog, and an informative guide on rate limits titled “Rate limits: Standard v1.1 Twitter.”

These materials offer fresh perspectives and practical advice for optimizing system performance while ensuring fair usage. Dive into these captivating resources to discover effective ways to tackle rate-limiting challenges in your designs.

Reasons

  • to put in place a defensive measure for services (this is quite essential for large companies)
  • protect services from excessive use
  • put in place a “fair use policy”
  • Prevent resource starvation
  • reduce cost

Requirements

  • accurate limit excessive request
  • low latency, the limiter should not slow down the service
  • use little memory
  • distributed
  • exception handling
  • high fault tolerance

Throttling algorithms

Token bucket

  • Each request utilizes one token.
  • Tokens are replenished at regular intervals of time (every n amount).
  • In case there are no tokens available:
    • The request can be stored for later handling.
    • Alternatively, the request can be dropped.

tokenbucket

Variables (Token bucket)

  • Bucket size
  • Refill rate

Leaky bucket

This algorithm utilizes a queue (FIFO) to manage requests. If the bucket is already at its maximum capacity, any new requests will be ignored.

The process involves adding new requests to the queue (bucket). If the queue is full, there are two options: the request can either be dropped or saved for later processing.

At regular intervals, requests are pulled from the queue and processed.

leakingbucket

Pros (Leaky bucket)

  • Memory efficient, limited queue size
  • Request are processed at a fixed rate

Cons (Leaky bucket)

  • In the event of a burst of traffic, new requests can merge with older requests if the latter have not been processed in a timely manner.

Variables (Leaky bucket)

  • The bucket size refers to the maximum capacity of the bucket, indicating the number of requests it can hold at any given time.
  • The overflow rate determines the number of requests that can be processed within a fixed rate or time interval. It represents the rate at which the bucket can handle incoming requests without overflowing.

Fixed window counter

  • The algorithm partitions the timeline into time windows of fixed size.
  • For each request, the counter corresponding to each window is incremented.

fixedwindow1

  • Time edges can be a challenge.
  • This can result in the possibility of having twice as many allowed requests.

fixedwindow2

Pros (fixed window)

  • Memory efficient
  • Fixed counter per n amount of time can fit some use cases
  • Easy to understand

Variables (fixed window)

  • Counter per window
  • Window time

Sliding window log

  • keep track of the timestamps (request)
  • when a new request comes in remove all outdated timestamps, those are defined to be older than the current time window
  • add timestamp of the new request to the log
  • if the log size is bigger than the max allowed drop / save else handle normally

slidingwindowlog

Pros (sliding window log)

  • The algorithm is very accurate

Cons (sliding window log)

  • Memory hard

Variables (sliding window log)

  • Request per minute

Sliding window

Similar to the fixed window approach, this algorithm employs a timer and a counter to manage the maximum number of available requests per user. However, in this case, the timer begins counting only after the first request is made.

This algorithm combines the benefits of both the bucket and fixed window approaches. Unlike the previous methods, the sliding window implementation ensures that newer requests are not starved, and the system is not overwhelmed by bursts of requests.

slidingwindow

Pros (sliding window)

  • Smooths out spikes by basing the current rate on the average rate of the previous window.

  • It is memory efficient.

Variables (sliding window)

  • Request per minute
  • Prev minute
  • Prev percentage
  • Current minute
  • Current time
  • Current percentage

References

system design interview Rate-limiting strategies and techniques, An alternative approach to rate limiting & Rate limits: Standard v1.1 twitter