13 - Consistency and CAP Theorem
📋 Jump to TakeawaysCAP Theorem
The CAP theorem states that a distributed system can only guarantee two of three properties simultaneously:
- Consistency — every read returns the most recent write
- Availability — every request receives a response (even if it's stale)
- Partition tolerance — the system continues operating despite network failures between nodes
Network partitions are inevitable in distributed systems. A partition means nodes can't communicate — a cable gets cut, a switch fails, or a cloud provider has a networking issue. Both nodes are still running, but they can't reach each other. You can't prevent this, only design for what happens when it occurs. So the real choice is between consistency and availability when a partition happens.
CP vs AP Systems
CP (Consistency + Partition tolerance) — during a partition, the system rejects requests rather than return stale data. The system is unavailable but never inconsistent.
Examples: ZooKeeper, HBase, MongoDB (with majority write concern — meaning writes aren't confirmed until a majority of nodes acknowledge them, so if most nodes are unreachable, writes are rejected rather than risk inconsistency).
Use when: correctness matters more than uptime. Financial transactions, inventory counts, leader election.
AP (Availability + Partition tolerance) — during a partition, the system returns whatever data it has, even if stale. The system stays available but may be inconsistent.
Examples: Cassandra, DynamoDB, CouchDB.
Use when: uptime matters more than perfect accuracy. Social media feeds, product catalogs, DNS.
Consistency Models
Not all consistency is the same. There's a spectrum from strict to relaxed:
Strong consistency — after a write completes, all subsequent reads see that write. Simple to reason about but expensive (requires coordination between nodes). Every read goes to the leader or waits for replication to complete.
Eventual consistency — after a write, replicas will converge to the same value eventually. No guarantee on when. Reads may return stale data for seconds or even minutes. The most common model in distributed NoSQL systems.
Causal consistency — if operation A causes operation B, everyone sees A before B. Unrelated operations (concurrent) may be seen in different orders by different nodes. Stronger than eventual, cheaper than strong.
Example: in a chat app, User A posts "Anyone want lunch?" and User B replies "Sure, where?". Causal consistency guarantees every node sees A's message before B's reply. But if User C independently posts "Nice weather today" at the same time, different nodes might show C's message before or after A's — that's fine because C's message has no causal relationship to A's.
Read-your-writes — a user always sees their own writes immediately. Other users may see stale data. Common in web applications (you post a comment and immediately see it, even if others don't yet). Often implemented by routing reads to the leader for the user who just wrote.
Practical Tradeoffs
In practice, most systems don't pick one consistency model for everything. They mix based on the use case:
- Strong consistency: user authentication (you can't have stale passwords), payment processing
- Eventual consistency: social feed (seeing a post 2 seconds late is fine), product view counts
- Causal consistency: comment threads. If User A posts a comment and User B replies, everyone must see A's comment before B's reply. But unrelated comments on different posts can appear in any order.
- Read-your-writes: after you post a tweet, you immediately see it on your own timeline. Other users might see it a few seconds later (eventual), but you'd be confused if your own post disappeared after submitting.
- Mixed: inventory count uses strong consistency for the final purchase (can't oversell) but eventual consistency for the display count on the product page (showing "3 left" vs "4 left" for a second is fine).
Design per use case, not per system. A single application might use strong consistency for payments and eventual consistency for notifications.
Logical Clocks and Vector Clocks
Distributed systems need to track the order of operations across nodes. Physical clocks drift and can't be perfectly synchronized, so systems use logical mechanisms instead.
Lamport clocks — each node maintains a counter. On every operation, increment the counter. When sending a message, include the counter. When receiving, set your counter to max(local, received) + 1. This gives a partial causal ordering (if A caused B, then L(A) < L(B)), but the converse isn't true — if L(A) < L(B), you can't tell whether B was caused by A or just happened later independently. Lamport clocks cannot detect concurrent events.
Vector clocks — each node maintains a vector of counters, one per node. Node X increments vector[X] on every local operation. When sending, attach the full vector. When receiving, merge by taking the element-wise max, then increment your own entry.
Node A: [A:1, B:0, C:0] → writes "Anyone want lunch?"
Node B: [A:1, B:1, C:0] → sees A's write, replies "Sure, where?"
Node C: [A:0, B:0, C:1] → independently writes "Nice weather"Comparing vectors tells you the relationship:
- If every entry in V1 ≤ corresponding entry in V2 → V1 happened before V2 (causal)
- If some entries are greater and some are less → concurrent (no causal relationship)
Node B's vector [A:1, B:1, C:0] shows it saw A's operation (A:1), so B's reply is causally after A's message. Node C's vector [A:0, B:0, C:1] has never seen A or B — it's concurrent with both.
Lamport Clocks vs Vector Clocks
| Lamport Clock | Vector Clock | |
|---|---|---|
| Size | Single integer | N integers (one per node) |
| Detects causality | No (only ordering) | Yes |
| Detects concurrency | No | Yes |
| Scalability | Excellent | Degrades with many nodes |
| Used by | Raft log index, Kafka offsets | Riak (dotted version vectors), CRDTs |
Vector clocks grow with the number of nodes. For systems with thousands of nodes, alternatives like dotted version vectors or hybrid logical clocks (HLC) combine physical timestamps with logical counters to keep size bounded while still detecting causality.
Key Takeaways
- CAP theorem: during a partition, choose consistency or availability
- CP systems reject requests to stay consistent; AP systems serve stale data to stay available
- Strong consistency is expensive; eventual consistency is cheap but stale
- Mix consistency models within a system based on each feature's requirements
- Most web applications use eventual consistency for reads and strong consistency for critical writes