06 - Load Balancers

What Load Balancers Do

A load balancer sits between clients and your servers. It distributes incoming requests across multiple backend instances so no single server gets overwhelmed.

Without a load balancer, all traffic hits one server. That server becomes a bottleneck and a single point of failure. With a load balancer, you get both scalability (spread the load) and reliability (if one server dies, traffic routes to the others).

Layer 4 vs Layer 7

Load balancers operate at different network layers:

Layer 4 (Transport) — routes based on IP address and TCP port. It doesn't inspect the request content. Fast and efficient, but dumb. It can't route based on URL path or headers.

A TCP port is a number (0-65535) that identifies a specific service on a machine. IP address is the building, port is the apartment number. Port 80 is HTTP, port 443 is HTTPS, port 5432 is PostgreSQL, port 6379 is Redis. L4 load balancers see only these numbers, not what's inside the request.

Layer 7 (Application) — inspects HTTP headers, URLs, cookies. It can route /api requests to one set of servers and /static to another. Slower than L4 but much more flexible.

Most modern web applications use L7 load balancers. The overhead is negligible for typical workloads, and the routing flexibility is worth it.

Routing Algorithms

How does the load balancer decide which server gets the next request?

Round Robin — each server gets a turn in sequence. Simple, works well when all servers are identical.

Weighted Round Robin — servers with more capacity get more requests. Useful when your fleet has mixed hardware.

Least Connections — send to the server with the fewest active connections. Good for long-lived requests where processing time varies. The load balancer tracks this by incrementing a counter when it sends a request and decrementing when the response comes back. Since all traffic flows through it, no extra coordination is needed.

IP Hash — hash the client's IP to always route them to the same server. Useful for session affinity. With naive modulo hashing, adding or removing servers remaps most clients. Consistent hashing (covered in Lesson 11) solves this by only remapping ~1/N of clients when the pool changes.

Random — pick a server at random. Surprisingly effective at scale due to the law of large numbers.

Health Checks

A load balancer needs to know which servers are healthy. It does this by periodically sending health check requests (usually HTTP GET to a /health endpoint).

If a server fails to respond, the load balancer stops sending traffic to it. When it recovers, traffic resumes.

Two types:

Active — the load balancer pings servers on a schedule
Passive — the load balancer monitors real traffic and marks servers unhealthy after repeated failures

Most production setups use both.

Single Point of Failure

The load balancer itself can fail. The solution: run multiple load balancers in an active-passive or active-active configuration.

Active-passive — one handles traffic, the other monitors via heartbeat. If the primary dies, the secondary takes over its IP (via VRRP/virtual IP). Simpler: one IP to manage, no DNS coordination, no connection state to sync. Use when one LB can handle your full traffic load.

Active-active — both load balancers handle traffic simultaneously. DNS returns both IPs. If one dies, the other absorbs all traffic. Better utilization and instant failover, but requires DNS-level routing and both LBs sized for full load. Use when traffic exceeds one LB's capacity or you need zero-downtime failover.

In practice, cloud LBs (AWS ALB, GCP LB) are active-active behind the scenes — you don't choose. Active-passive is more common in self-managed setups (e.g., two Nginx boxes with keepalived) because active-active requires solving extra problems yourself: DNS health checks, config sync across nodes, and session state sharing. Active-passive is just "secondary takes the IP when primary dies" — one tool, one problem to solve.

Cloud providers (AWS ALB, GCP Load Balancer, Cloudflare) handle this for you. They run redundant load balancers across availability zones.

Key Takeaways

Load balancers distribute traffic and provide fault tolerance
L4 is fast but limited; L7 gives content-aware routing
Choose the routing algorithm based on your workload characteristics
Health checks detect and remove unhealthy servers automatically
Run redundant load balancers to avoid a single point of failure