Design a Rate Limiter for a Public API

This example follows the system design process from lesson 01. We'll go through each step: requirements, estimation, high-level design, and deep dive.

Step 1: Requirements

Functional:

Limit requests per user based on subscription tier
Free tier: 100 requests/minute
Pro tier: 1,000 requests/minute
Return clear headers so clients know their remaining quota
Reject excess requests with 429 status code

Non-functional:

Must work across multiple API servers (distributed)
Must add under 1ms latency (can't slow down the request path)
Must handle 10,000 concurrent users

Step 2: Estimation

Users: 10,000
Average requests per user per minute: 30
Total: 300,000 requests/min = 5,000 QPS
Peak (3x): 15,000 QPS

Storage per user: one counter + TTL = ~50 bytes
Total memory: 10,000 × 50 bytes = 500 KB

This is tiny. A single Redis instance handles millions of keys and 100K+ operations/sec. No sharding needed.

Step 3: High-Level Design

    Client
       ↓
   API Gateway
       ↓
  Rate Limiter (check Redis)
       │
  ┌────┴────┐
  ▼         ▼
Allow     Reject
  ↓       (429)
Backend
Service

The rate limiter lives in the gateway layer. It checks Redis before forwarding the request. If over limit, it rejects immediately without touching the backend.

Algorithm choice: Fixed-window counter. Simple, fast, and good enough for per-minute limits. The tradeoff is the boundary problem (a user could send 100 requests at 0:59 and 100 at 1:00), but for most APIs this is acceptable. If bursts at boundaries become an issue, upgrade to a sliding window counter.

Storage: Redis. We have multiple API servers, so we need shared state. A counter in one server's memory doesn't know about requests hitting other servers. Redis INCR is atomic and sub-millisecond.

Step 4: Deep Dive

Key Design

rate_limit:{user_id}:{window}
Example: rate_limit:u_456:2026-05-19T11:25

One key per user per minute. TTL of 60 seconds (auto-cleanup).

Request Flow

Request arrives at the gateway
Extract user ID from the auth token
Look up their tier (cached in memory: free=100, pro=1000)
INCR rate_limit:u_456:2026-05-19T11:25 in Redis
If the counter exceeds the limit, return 429 Too Many Requests
If first request in this window, set EXPIRE to 60 seconds
Otherwise, forward the request to the backend

Response Headers

On success:

HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 73
X-RateLimit-Reset: 1716130020

On rejection:

HTTP/1.1 429 Too Many Requests
Retry-After: 34
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0

Edge Cases

Redis is down: fail open (allow the request). A few seconds without rate limiting is better than rejecting all traffic. Alert the team.

Clock skew between servers: doesn't matter. All servers use the same Redis key with the same window string. They don't need synchronized clocks.

User switches from free to pro: the tier lookup is cached with a short TTL (5 min). They get the new limit within minutes.

Burst at window boundary: this is the fixed-window boundary problem. A user could send 100 at 0:59 and 100 at 1:00. For most APIs this is acceptable. If not, switch to sliding window counter (weighted average of current + previous window).

Concepts Used

Concept	Lesson	How it's used here
System design process	01	Requirements → Estimation → Design → Deep dive
Caching	07	Tier lookup cached in memory
API gateway	17	Rate limiting happens at the gateway layer
Rate limiting algorithms	19	Fixed-window counter in Redis
Designing for failure	21	Fail open if Redis is down