14 - PACELC and Consensus
📋 Jump to TakeawaysBeyond CAP
CAP theorem only describes what happens during a network partition. But partitions are rare. What about the other 99.9% of the time?
That's where PACELC comes in. It extends CAP to describe the tradeoff during normal operation too.
PACELC Explained
The name breaks down as:
- Partition → choose Availability or Consistency
- Else (no partition) → choose Latency or Consistency
During normal operation, strong consistency still has a cost: latency. Every write must be confirmed by multiple nodes before it's acknowledged. Every read must check the latest version. That coordination takes time.
So even when the network is healthy, you're choosing between fast responses (low latency) and guaranteed freshness (consistency).
PACELC Classifications
PA/EL (Available + Low Latency) — prioritizes speed in all conditions. During partitions: stay available. During normal operation: minimize latency.
Example: DynamoDB, Cassandra. Reads are fast because they don't wait for all replicas to agree. During partitions, they keep serving (possibly stale) data.
PC/EC (Consistent always) — prioritizes correctness in all conditions. During partitions: reject requests. During normal operation: wait for coordination.
Example: PostgreSQL with synchronous replication, ZooKeeper. Writes are slower because they wait for replicas to confirm. During partitions, they refuse to serve rather than risk inconsistency.
PA/EC (Available during partitions, Consistent otherwise) — a pragmatic middle ground. During normal operation, pay the latency cost for consistency. During partitions, switch to availability.
Example: MongoDB (with readPreference: secondary). Normally reads from the primary (consistent). During a partition, reads from secondaries (available but possibly stale). Note: MongoDB's default (readPreference: primary) is closer to PC/EC — the minority partition refuses reads.
Why This Matters
PACELC helps you evaluate databases and distributed systems more precisely than CAP alone. When someone says "we use Cassandra," you know it's PA/EL: fast reads, eventual consistency, stays available during partitions. When someone says "we use CockroachDB," you know it's PC/EC: slower writes, strong consistency, may reject requests during partitions.
This informs your architecture decisions. If your feature needs sub-10ms reads and can tolerate stale data, pick a PA/EL system. If it needs correctness above all else, pick PC/EC and accept the latency.
Consensus Algorithms
Strong consistency requires nodes to agree on the current state. How do they agree? Consensus algorithms.
The core problem: multiple nodes need to accept the same value in the same order, even if some nodes crash or messages are delayed.
Raft
Raft is the most widely used consensus algorithm today. It works by electing a leader:
- One node is elected leader. It handles all writes.
- The leader replicates each write to follower nodes.
- Once a majority of nodes confirm the write, it's committed.
- If the leader crashes, followers detect the timeout and elect a new leader.
The key insight: only the leader accepts writes, and a write is only committed when a majority acknowledges it. This guarantees that committed data survives any single node failure.
Used by: etcd (Kubernetes' state store), CockroachDB, Consul, TiKV.
Consul is HashiCorp's service mesh and service discovery tool. It maintains a distributed key-value store for configuration, tracks which services are healthy, and handles service-to-service networking. It uses Raft internally to keep this state consistent across nodes.
Paxos
The original consensus algorithm, written by Leslie Lamport in 1989 but not formally published until 1998. Mathematically proven correct but notoriously difficult to implement and understand.
Raft was explicitly designed as "Paxos made understandable." In practice, most new systems choose Raft. You'll encounter Paxos in older systems and academic papers.
The Cost of Consensus
Consensus adds latency to every write:
- The leader must send the write to followers
- It must wait for a majority to respond
- Only then can it acknowledge the write to the client
With 5 nodes across regions, this could add 50-150ms per write. That's the price of strong consistency in a distributed system.
This is why many systems offer tunable consistency: use consensus for critical writes (payments) and skip it for non-critical ones (view counts).
Key Takeaways
- PACELC extends CAP: even without partitions, you trade latency for consistency
- PA/EL systems are fast but eventually consistent; PC/EC systems are correct but slower
- Raft is the standard consensus algorithm: leader-based, majority-confirms
- Consensus guarantees correctness but adds latency to every write
- Choose your consistency level per feature, not per system