Design a Rate Limiter for a Public API

This example follows the system design process from lesson 01. We'll go through each step: requirements, estimation, high-level design, and deep dive.

Step 1: Requirements

Functional:

  • Limit requests per user based on subscription tier
  • Free tier: 100 requests/minute
  • Pro tier: 1,000 requests/minute
  • Return clear headers so clients know their remaining quota
  • Reject excess requests with 429 status code

Non-functional:

  • Must work across multiple API servers (distributed)
  • Must add under 1ms latency (can't slow down the request path)
  • Must handle 10,000 concurrent users

Step 2: Estimation

Users: 10,000
Average requests per user per minute: 30
Total: 300,000 requests/min = 5,000 QPS
Peak (3x): 15,000 QPS

Storage per user: one counter + TTL = ~50 bytes
Total memory: 10,000 × 50 bytes = 500 KB

This is tiny. A single Redis instance handles millions of keys and 100K+ operations/sec. No sharding needed.

Step 3: High-Level Design

    Client

   API Gateway

  Rate Limiter (check Redis)

  ┌────┴────┐
  ▼         ▼
Allow     Reject
  ↓       (429)
Backend
Service

The rate limiter lives in the gateway layer. It checks Redis before forwarding the request. If over limit, it rejects immediately without touching the backend.

Algorithm choice: Fixed-window counter. Simple, fast, and good enough for per-minute limits. The tradeoff is the boundary problem (a user could send 100 requests at 0:59 and 100 at 1:00), but for most APIs this is acceptable. If bursts at boundaries become an issue, upgrade to a sliding window counter.

Storage: Redis. We have multiple API servers, so we need shared state. A counter in one server's memory doesn't know about requests hitting other servers. Redis INCR is atomic and sub-millisecond.

Step 4: Deep Dive

Key Design

rate_limit:{user_id}:{window}
Example: rate_limit:u_456:2026-05-19T11:25

One key per user per minute. TTL of 60 seconds (auto-cleanup).

Request Flow

  1. Request arrives at the gateway
  2. Extract user ID from the auth token
  3. Look up their tier (cached in memory: free=100, pro=1000)
  4. INCR rate_limit:u_456:2026-05-19T11:25 in Redis
  5. If the counter exceeds the limit, return 429 Too Many Requests
  6. If first request in this window, set EXPIRE to 60 seconds
  7. Otherwise, forward the request to the backend

Response Headers

On success:

HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 73
X-RateLimit-Reset: 1716130020

On rejection:

HTTP/1.1 429 Too Many Requests
Retry-After: 34
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0

Edge Cases

Redis is down: fail open (allow the request). A few seconds without rate limiting is better than rejecting all traffic. Alert the team.

Clock skew between servers: doesn't matter. All servers use the same Redis key with the same window string. They don't need synchronized clocks.

User switches from free to pro: the tier lookup is cached with a short TTL (5 min). They get the new limit within minutes.

Burst at window boundary: this is the fixed-window boundary problem. A user could send 100 at 0:59 and 100 at 1:00. For most APIs this is acceptable. If not, switch to sliding window counter (weighted average of current + previous window).

Concepts Used

Concept Lesson How it's used here
System design process 01 Requirements → Estimation → Design → Deep dive
Caching 07 Tier lookup cached in memory
API gateway 17 Rate limiting happens at the gateway layer
Rate limiting algorithms 19 Fixed-window counter in Redis
Designing for failure 21 Fail open if Redis is down
© 2026 ByteLearn.dev. Free courses for developers. · Privacy