Design a Rate Limiter for a Public API
This example follows the system design process from lesson 01. We'll go through each step: requirements, estimation, high-level design, and deep dive.
Step 1: Requirements
Functional:
- Limit requests per user based on subscription tier
- Free tier: 100 requests/minute
- Pro tier: 1,000 requests/minute
- Return clear headers so clients know their remaining quota
- Reject excess requests with 429 status code
Non-functional:
- Must work across multiple API servers (distributed)
- Must add under 1ms latency (can't slow down the request path)
- Must handle 10,000 concurrent users
Step 2: Estimation
Users: 10,000
Average requests per user per minute: 30
Total: 300,000 requests/min = 5,000 QPS
Peak (3x): 15,000 QPS
Storage per user: one counter + TTL = ~50 bytes
Total memory: 10,000 × 50 bytes = 500 KBThis is tiny. A single Redis instance handles millions of keys and 100K+ operations/sec. No sharding needed.
Step 3: High-Level Design
Client
↓
API Gateway
↓
Rate Limiter (check Redis)
│
┌────┴────┐
▼ ▼
Allow Reject
↓ (429)
Backend
ServiceThe rate limiter lives in the gateway layer. It checks Redis before forwarding the request. If over limit, it rejects immediately without touching the backend.
Algorithm choice: Fixed-window counter. Simple, fast, and good enough for per-minute limits. The tradeoff is the boundary problem (a user could send 100 requests at 0:59 and 100 at 1:00), but for most APIs this is acceptable. If bursts at boundaries become an issue, upgrade to a sliding window counter.
Storage: Redis. We have multiple API servers, so we need shared state. A counter in one server's memory doesn't know about requests hitting other servers. Redis INCR is atomic and sub-millisecond.
Step 4: Deep Dive
Key Design
rate_limit:{user_id}:{window}
Example: rate_limit:u_456:2026-05-19T11:25One key per user per minute. TTL of 60 seconds (auto-cleanup).
Request Flow
- Request arrives at the gateway
- Extract user ID from the auth token
- Look up their tier (cached in memory: free=100, pro=1000)
INCR rate_limit:u_456:2026-05-19T11:25in Redis- If the counter exceeds the limit, return
429 Too Many Requests - If first request in this window, set
EXPIREto 60 seconds - Otherwise, forward the request to the backend
Response Headers
On success:
HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 73
X-RateLimit-Reset: 1716130020On rejection:
HTTP/1.1 429 Too Many Requests
Retry-After: 34
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0Edge Cases
Redis is down: fail open (allow the request). A few seconds without rate limiting is better than rejecting all traffic. Alert the team.
Clock skew between servers: doesn't matter. All servers use the same Redis key with the same window string. They don't need synchronized clocks.
User switches from free to pro: the tier lookup is cached with a short TTL (5 min). They get the new limit within minutes.
Burst at window boundary: this is the fixed-window boundary problem. A user could send 100 at 0:59 and 100 at 1:00. For most APIs this is acceptable. If not, switch to sliding window counter (weighted average of current + previous window).
Concepts Used
| Concept | Lesson | How it's used here |
|---|---|---|
| System design process | 01 | Requirements → Estimation → Design → Deep dive |
| Caching | 07 | Tier lookup cached in memory |
| API gateway | 17 | Rate limiting happens at the gateway layer |
| Rate limiting algorithms | 19 | Fixed-window counter in Redis |
| Designing for failure | 21 | Fail open if Redis is down |