07 - Caching

Why Cache

Databases are slow. Network calls are slow. Computation is slow. Caching stores the result of expensive operations so you don't repeat them.

A Redis cache hit takes under 1ms (it's reading from RAM). A database query typically takes 1-10ms (with indexes and data in buffer pool) but can reach 50-100ms for complex queries or cold reads from disk. At scale, even the 1ms vs 5ms difference matters when multiplied by millions of requests.

Cache Levels

Caching happens at multiple layers:

Browser cache — the client stores static assets (images, CSS, JS). Controlled by HTTP headers like Cache-Control and ETag. An ETag is a fingerprint of the file's content. On the next request, the browser asks "has this changed?" by sending the ETag back. If not, the server responds with 304 Not Modified instead of re-sending the whole file.

CDN cache — edge servers store content close to users. Reduces latency and offloads your origin servers.

Application cache — your service stores computed results in memory (Redis, Memcached). The most common layer to design around.

Database cache — the database itself caches query results and frequently accessed pages in memory (buffer pool).

Each layer reduces load on the layer below it.

Cache Strategies

Cache-aside (Lazy Loading) — the application checks the cache first. On a miss, it reads from the database, then writes the result to the cache. Simple and flexible. The downside: the first request for any data is always slow (cold cache).

Write-through — every write goes to both the cache and the database. The cache is always up to date. Downside: writes are slower because you're writing twice, and the cache fills with recently written data that may not be frequently read (eviction still applies, but you're populating the cache with every write regardless of read patterns).

Write-behind (Write-back) — writes go to the cache only. The cache asynchronously flushes to the database. Fast writes, but you risk data loss if the cache crashes before flushing.

Read-through — the cache itself handles fetching from the database on a miss. The application only talks to the cache. Simplifies application code but couples your cache to your data source.

Cache-aside is the most common in practice. Most teams use it because it's simple and flexible. Here are two examples showing when to use cache-aside vs write-through:

Cache Strategies Examples

Example 1: Product catalog (use cache-aside). Users browse products. Reads are 100x more frequent than writes. On a cache miss, fetch from the database and cache it. When a product is updated (rare), invalidate the cache entry. Simple, effective.

Example 2: Leaderboard scores (use write-through). Every game action updates a score, and users constantly view the leaderboard. Writes and reads are both frequent, and stale data is unacceptable. Write-through keeps the cache always fresh so reads never hit the database.

Cache Invalidation

Cache invalidation is the hardest problem in caching: when the underlying data changes, the cache becomes stale.

Cache Invalidation Strategies

TTL (Time to Live) — cache entries expire after a set duration. Simple but imprecise. Data might be stale for up to the TTL duration.

Event-driven invalidation — when data changes, publish an event that invalidates the relevant cache entries. Precise but complex to implement.

Version keys — include a version number in the cache key. When data changes, increment the version. Old entries naturally become unreachable. Example: cache key is user:123:v5. When the user updates their profile, increment to v6. The app now reads user:123:v6 (cache miss, fetches fresh data). The old v5 entry is never requested again and eventually gets evicted by the cache's eviction policy (LRU) or expires via TTL.

There's no perfect solution. Every approach trades freshness for complexity.

Cache Eviction

Caches have limited memory. When full, something has to go.

LRU (Least Recently Used) — evict the entry that hasn't been accessed the longest. The most common policy.

LFU (Least Frequently Used) — evict the entry accessed the fewest times. Better for workloads with stable hot sets. Example: a CDN caching video thumbnails. The top 1,000 thumbnails get millions of views. If a bot crawls 100,000 old thumbnails once, LRU would evict the popular ones (they're "less recent"). LFU keeps them because their access count is high.

FIFO (First In, First Out) — evict the oldest entry. Simple but doesn't account for access patterns.

Redis defaults to noeviction (returns errors when memory is full). When you configure an eviction policy (e.g., allkeys-lru), Redis uses an approximated LRU — it samples a few keys and evicts the least recently used among them. Good enough for most workloads.

Thundering Herd

When a popular cache entry expires, hundreds of requests simultaneously hit the database for the same data. This is the thundering herd problem.

Solutions:

Lock/mutex — only one request fetches from the database; others wait for the cache to be repopulated
Stale-while-revalidate — when the cache entry expires, serve the old (stale) value immediately to the user, and trigger a background refresh. The user gets a fast response (no waiting), and the next user gets fresh data. Slightly stale for one request, but no latency spike.
Probabilistic early expiration — randomly refresh entries before they expire

Key Takeaways

Cache at every layer: browser, CDN, application, database
Cache-aside is the most common pattern for application caching
Invalidation is the hard part — choose TTL for simplicity, events for precision
LRU is the default eviction policy for good reason
Watch for thundering herd on popular keys