02 - Back-of-Envelope Estimation

Why Estimate

Estimation tells you what kind of system you need. The difference between 100 requests per second and 100,000 requests per second is the difference between a single server and a distributed cluster.

You don't need exact numbers. You need the right order of magnitude. Is it thousands or millions? Gigabytes or petabytes? That's what drives architecture decisions.

Latency Numbers That Matter

These are rough values for how long common operations take:

Operation	Time	What it means
L1/L2 cache read	1–10 ns	CPU cache hit
RAM read	100 ns	In-process cache (HashMap, local memory)
SSD random read	10–100 μs	NVMe ~10-20μs, SATA SSD ~100μs
Redis/Memcached round-trip¹	0.5–1 ms	In-memory cache with network hop
Network (same datacenter)	0.5 ms	Calling another service in your cluster
Typical DB query²	1–5 ms	SSD + network + query processing
HDD random read³	10 ms	Spinning disk (cold storage, archival)
Network (cross-continent)	150 ms	A user in Tokyo hitting a server in Virginia

¹ Redis itself reads from RAM in nanoseconds, but the network round-trip dominates. A local in-process cache avoids this entirely.

² What developers actually experience: the disk read plus network plus query parsing and execution.

³ HDDs are mostly used for cold storage and archival now. Modern production databases run on SSDs.

The key insight: reading from memory is 1,000x faster than SSD and 100,000x faster than HDD. That's why caching matters. And a cross-continent network call adds 150ms that no amount of code optimization can fix. That's why CDNs matter.

Estimating QPS (Queries Per Second)

Start with users, work down to requests:

Daily Active Users (DAU): 10 million
Actions per user per day: 5
Total requests per day: 50 million
Requests per second: 50M / 100K ≈ 500 QPS
Peak (2-3x average): ~1,500 QPS

"Actions per user per day" depends on the product — a messaging app might be 50, a banking app might be 2. Ask during requirements or make a reasonable assumption and state it.

Why 2-3x for peak? Traffic isn't evenly spread across 24 hours. Most users are active during a few peak hours (evenings, lunch breaks). So the busiest hour might see 3x the average load.

Use 2x when traffic is predictable and spread throughout the day — URL shorteners, file storage, B2B SaaS, email. Use 3x when traffic is spiky and event-driven — social feeds, notifications, e-commerce, chat apps. The core question: how concentrated is your usage? Steady trickle → 2x. Sharp peaks → 3x.

Splitting reads and writes:

Most systems are read-heavy. If the read-to-write ratio is 10:1:

500 QPS total → ~450 reads/sec + ~50 writes/sec

This matters because reads and writes scale differently (caching helps reads, queues help writes).

Rules of thumb:

1 day ≈ 100K seconds (actual: 86,400)
Peak is typically 2-3x the average
Read-heavy systems (social media, news): 10:1 read-to-write ratio is common
Write-heavy systems (logging, analytics): closer to 1:1
A single modern server handles ~10K–50K simple requests/sec — use this to estimate how many servers you need

Estimating Storage

Figure out what you're storing and how much per record:

New users per day: 100,000
Data per user profile: ~1 KB
  (name: 50B + email: 50B + avatar URL: 100B + preferences: 200B + metadata: 600B)
Daily storage growth: 100,000 × 1 KB = 100 MB
Yearly: 100 MB × 365 = 36 GB
Over 5 years: 36 × 5 = 180 GB

Why 5 years? It's a common capacity planning horizon. It also tests whether your design needs data lifecycle management (archival, deletion, tiered storage).

For media (images, videos), the numbers explode:

Photos uploaded per day: 1 million
Average photo size: 500 KB
Daily storage: 1M × 500 KB = 500 GB
Yearly: 500 GB × 365 = 180 TB

180 GB fits in a single database. 180 TB doesn't. That's when you need object storage (S3) and a CDN.

Rules of thumb:

Multiply raw storage by 3x for indexes, replicas, and backups
Unit conversions: KB → MB → GB → TB, each step is ×1,000
If total storage exceeds a few TB, you likely need sharding or object storage

Estimating Bandwidth

How much data flows through the system per second.

For most API-only systems, bandwidth is rarely the bottleneck. This calculation mainly matters for media-heavy systems (video, images, file uploads).

Outbound (downloads, responses):

Example: a JSON API
QPS: 1,000
Average response size: 10 KB (JSON payload + headers)
Outbound bandwidth: 1,000 × 10 KB = 10 MB/s

Typical response sizes: 1–5 KB for a single object, 10–50 KB for a list of items, 100+ KB for heavy payloads. Estimate based on what your API actually returns.

Inbound (uploads, POST bodies):

Example: a photo-sharing app
Photo uploads per second: 100
Average photo size: 500 KB
Inbound bandwidth: 100 × 500 KB = 50 MB/s

For video streaming:

Concurrent viewers: 100,000
Bitrate: 5 Mbps per viewer
Total bandwidth: 100,000 × 5 = 500 Gbps

500 Gbps from a single origin is impossible. That's why every video platform uses a CDN.

⚠️ Watch the units: MB/s is megabytes, Mbps is megabits. 1 MB/s = 8 Mbps. Network specs typically use bits, storage uses bytes.

Estimating Servers

How many machines do you need:

QPS: 10,000
Requests per server: 500
Servers needed: 10,000 / 500 = 20
With redundancy (2x): 40 servers

Where does "500 requests per server" come from? It's a conservative estimate for a typical web server handling API requests that involve database calls and business logic. CPU-heavy work (image processing, ML) might be 10–50 per server. Static file serving might be 10,000+. Adjust based on what your server actually does.

The 10K–50K number mentioned earlier is for simple/static requests. 500 is for requests that do real work (DB queries, auth, validation).

Why 2x redundancy? It gives you N+1 capacity — you can lose half your fleet and still serve traffic. Some teams use 3x for critical services.

For memory-bound services (caches):

Total data to cache: 100 GB
RAM per server: 64 GB
Usable RAM: ~50 GB (OS and processes take ~20%)
Cache servers needed: 100 / 50 = 2
With replication (3x): 6 servers

Why 3 replicas? One primary for writes, two replicas for read distribution + fault tolerance. If one node dies, you still have two copies serving traffic.

In interviews, exact server counts rarely matter. Interviewers care more about your architecture. But showing you can estimate capacity demonstrates practical thinking.

The Estimation Framework

For any system, estimate these five things:

QPS — from DAU¹ × actions per user ÷ 100K
Storage — from data per record × records per day × retention period
Bandwidth — from QPS × average response size
Memory (cache) — cache the hottest 20% of daily data (Pareto principle: 80% of requests hit 20% of the data). If you serve 10 GB of data per day, cache 2 GB.
Servers — from QPS ÷ per-server capacity

¹ DAU = Daily Active Users — unique users who use your product in a single day.

Common Pitfalls

Forgetting peak load — average QPS is useless for capacity planning. Design for peak. If your average is 600 QPS but peak is 1,800, your system needs to handle 1,800 without falling over.

Ignoring read vs write ratio — a system with 10K reads/sec and 100 writes/sec needs read replicas and caching. A system with 5K writes/sec needs sharding. Same total QPS, completely different architecture.

Not accounting for growth — design for where you'll be in 2-3 years, not today. If you're growing 3x per year, today's 1,000 QPS is next year's 3,000.

False precision — saying "we need exactly 847 servers" is silly. Say "roughly 1,000." The goal is order of magnitude, not a procurement order.

Example: URL Shortener

Assumptions:
- 100M new URLs per month
- 10:1 read-to-write ratio
- URLs stored for 5 years
- Average URL record: 100 bytes (short code + original URL)

Writes: 100M / (30 days × 100K sec) = 100M / 3M ≈ 33 writes/sec
Reads: 33 × 10 = 330 reads/sec
Peak reads: 330 × 3 = ~1,000/sec

Storage: 100M × 100 bytes = 10 GB/month
         10 GB × 12 months × 5 years = 600 GB

Cache: 20% of daily data is hot (Pareto principle)
       Daily reads: 330/sec × 100K sec = ~33M reads/day
       Many reads hit the same popular URLs, so unique URLs accessed is lower
       Hot set: ~6M unique URLs × 100 bytes = 600 MB
       → fits in a single Redis instance

Verdict: ~1,000 QPS peak with 600 GB storage.
A single database with read replicas handles this.

Key Takeaways

Estimate to determine the order of magnitude, not exact numbers
Start with DAU, derive QPS, storage, bandwidth, and server count
Always account for peak load (2-3x average)
Use round numbers: 1 day ≈ 100K seconds
The goal is to justify architecture decisions with numbers, not to be precise