Ojaswi Athghara | CAP Theorem Explained Simply: Consistency, Availability, Partition Tolerance

CAP Theorem Explained Simply: Consistency, Availability, Partition Tolerance

The Impossible Triangle: Pick Two, You Can't Have Three

You want your distributed database to be:

Consistent (all users see the same data)
Available (system always responds to requests)
Partition Tolerant (works even when network fails)

CAP Theorem says: You can only pick 2 out of 3.

It's like this triangle—you can't have all three corners simultaneously:

        Consistency
             /\
            /  \
           /    \
          /  CA  \
         /        \
        /----------\
Availability     Partition
                 Tolerance

This isn't a theory—it's a fundamental law of distributed systems, proven mathematically in 2002 by Eric Brewer.

Instagram chooses CP (consistency over availability during network partitions). Amazon chooses AP (availability over consistency—you might see stale product info, but checkout always works).

There's no right answer. Only trade-offs.

In this guide, I'll explain CAP theorem simply, show real-world examples, and help you understand which databases choose what—and why. Let's dive in.

What Is CAP Theorem?

CAP Theorem = In a distributed system, you can guarantee at most 2 of 3 properties:

Consistency (C)
Availability (A)
Partition Tolerance (P)

Proven by: Eric Brewer (2000), formally proven by Seth Gilbert and Nancy Lynch (2002)

The Three Properties

1. Consistency (C)

Definition: All nodes see the same data at the same time

In practice: After a write, all subsequent reads return the updated value

Example:

Consistent system:

User updates profile picture
    ↓
Write to Database Node 1
    ↓
Synchronously replicate to Node 2, Node 3
    ↓
Once all nodes updated: Return success
    ↓
Next read from any node: Returns new picture ✅

Non-consistent (eventual consistency):

User updates profile picture
    ↓
Write to Database Node 1
    ↓
Asynchronously replicate to Node 2, Node 3 (takes 100ms)
    ↓
Return success immediately
    ↓
Read from Node 2 (before replication): Returns old picture ❌
Wait 100ms, read again: Returns new picture ✅

Real Example: Banking

You transfer $100 from Account A to Account B:

Consistent:
  Account A: $500 → $400 (immediate)
  Account B: $200 → $300 (immediate)
  Any ATM, any location: Shows $400 and $300 ✅

Eventual consistency:
  Account A: $500 → $400 (immediate)
  Account B: $200 → $300 (after 1 second)
  ATM on East Coast: Shows $400 and $200 (inconsistent!) ❌
  1 second later: Shows $400 and $300 ✅

For banking, consistency is mandatory.

2. Availability (A)

Definition: Every request receives a response (success or failure), without guarantee it's the most recent data

In practice: System is always up, always responding (even if data is stale)

Example:

Available system:

User requests profile
    ↓
Server 1 is down ❌
    ↓
Request routed to Server 2
    ↓
Server 2 responds (maybe with cached/stale data)
    ↓
User gets a response ✅ (even if not perfectly up-to-date)

Unavailable system:

User requests profile
    ↓
Server 1 is down ❌
    ↓
System requires Server 1 for consistency
    ↓
Return error: "Service unavailable" ❌

Real Example: Amazon Shopping

Black Friday, millions of users:

Available system:
  - Show product info (even if stock count slightly stale)
  - Allow adding to cart
  - Process checkout
  - Better to show "possibly 5 left" than "Site down"

Unavailable system:
  - Wait for perfect stock count from all warehouses
  - If one warehouse offline: Show "Service unavailable"
  - Lose sales ❌

Amazon chooses availability (you might see stale stock, but site works).

3. Partition Tolerance (P)

Definition: System continues to operate despite network failures (network partition)

Network partition = Servers can't communicate (due to network failure, router issue, cable cut)

Example:

Partition-tolerant system:

        [Node 1 - US East]
              ⚡ Network partition! ⚡
        [Node 2 - US West]

Partition occurs (network failure)
    ↓
Node 1 and Node 2 can't communicate
    ↓
Both nodes continue accepting requests independently ✅
    ↓
When network heals: Reconcile differences

Not partition-tolerant:

        [Node 1 - US East]
              ⚡ Network partition! ⚡
        [Node 2 - US West]

Partition occurs
    ↓
System stops accepting writes (to maintain consistency)
    ↓
Users see errors ❌

Reality: In distributed systems, network failures are inevitable.

Cables get cut
Routers fail
Data centers lose connectivity
Cloud providers have outages

Therefore: Partition tolerance is NOT optional. You must handle it.

The Real Choice: CP vs AP

Since partition tolerance is mandatory (networks fail), the real choice is:

CP (Consistency + Partition Tolerance) vs AP (Availability + Partition Tolerance)

CP Systems (Consistency + Partition Tolerance)

Trade-off: Sacrifice availability during network partitions

During partition:

Network partition occurs
    ↓
System cannot guarantee consistency across nodes
    ↓
Reject writes (or reads) until partition heals
    ↓
Users see errors ❌ (temporarily unavailable)

Once partition heals:

Nodes reconnect
    ↓
Data consistent across all nodes ✅
    ↓
System resumes normal operation

CP Databases

1. MongoDB (CP-leaning)

How it works:

MongoDB Replica Set:
  - 1 Primary (handles writes)
  - 2 Secondaries (replicas)

Network partition:
  - Primary becomes isolated (can't reach majority)
  - Primary steps down (no longer accepts writes) ❌
  - Secondaries elect new primary
  - Writes go to new primary ✅

Result:

Consistency: All reads return latest data
Availability: During partition (10-30 seconds), writes are unavailable ❌

When to use: When consistency is critical (financial data, inventory)

2. HBase (CP)

Used by: Facebook Messages

During partition:

Region server loses connection to master
Stops serving requests until reconnected
Ensures consistency

3. Redis (with sentinel, CP-leaning)

How it works:

Network partition:
  - Master isolated
  - Sentinels detect failure (quorum vote)
  - Promote slave to master
  - During failover (10-30 seconds): Writes unavailable ❌

Real-World Example: Bank Transfer (CP System)

Scenario: Transfer $100 from Account A to Account B

CP System (PostgreSQL with synchronous replication):

Step 1: Deduct $100 from Account A
Step 2: Add $100 to Account B
Step 3: Commit transaction (only if both nodes acknowledge)

Network partition occurs during Step 2:
  - Node 2 unreachable
  - Transaction cannot complete
  - Rollback (Account A unchanged)
  - User sees error: "Transaction failed, try again" ❌

Result:
  - Consistent: Money not lost ✅
  - Not available: Transaction couldn't complete ❌

Better than: Money disappearing!

AP Systems (Availability + Partition Tolerance)

Trade-off: Sacrifice consistency during network partitions

During partition:

Network partition occurs
    ↓
Nodes accept writes independently (can't sync)
    ↓
Data diverges (different values on different nodes) ⚠️
    ↓
Users get responses ✅ (but possibly stale data)

Once partition heals:

Nodes reconnect
    ↓
Reconcile differences (conflict resolution)
    ↓
Eventually consistent ✅

AP Databases

1. Cassandra (AP)

How it works:

Cassandra Cluster: 10 nodes

Network partition:
  - Node 1-5 in US East (can't reach US West)
  - Node 6-10 in US West (can't reach US East)
  - Both sides continue accepting writes ✅

User A writes to US East: "status = online"
User B writes to US West: "status = offline"

Result: Conflicting data ⚠️

When partition heals:
  - Cassandra uses "last write wins" (timestamp-based)
  - Conflict resolved

Result:

Available: Always accepting requests ✅
Consistent: During partition, data diverges ❌ (eventually consistent)

When to use: When availability is critical (social media, analytics, logging)

2. DynamoDB (AP)

Used by: Amazon shopping cart

Why AP?

Black Friday: 100M users shopping
Network partition occurs (US East ↔ US West)

AP system:
  - Both regions continue working ✅
  - User in US East adds item to cart → Written to US East
  - User in US West removes item → Written to US West
  - Conflicting cart states ⚠️

When partition heals:
  - Merge carts (union of items)
  - Better to show extra item than lose sale ✅

Amazon's philosophy: Availability = revenue. Slight inconsistency acceptable.

3. CouchDB (AP)

How it works:

Multi-master replication
All nodes accept writes
Conflicts resolved via versioning

Scenario: User posts a status update

AP System (Facebook):

User posts: "Just got engaged!"
    ↓
Write to Database Node 1 (US East)
    ↓
Asynchronously replicate to Node 2 (US West)
    ↓
Immediately show post to user ✅

Network partition occurs:
    ↓
Replication delayed (500ms)
    ↓
Friend in US West views profile: Doesn't see post yet ⚠️
    ↓
500ms later: Post appears ✅

Result:
  - Available: User could post ✅
  - Consistent: Brief delay for global sync ⚠️ (eventual consistency)

Acceptable trade-off for social media. No one dies if they see a post 500ms late.

CA Systems (Consistency + Availability)

Wait, what about CA (Consistency + Availability without Partition Tolerance)?

Theoretical: Single-node databases (no network = no partitions)

Examples:

PostgreSQL (single instance)
MySQL (single instance)

Reality: Not realistic for large-scale systems

Single node = single point of failure
Can't scale horizontally
No geographic distribution

Used for: Small applications, development environments

CAP Theorem in Practice: Database Choices

Database	CAP Type	Real Use Case
PostgreSQL (single)	CA	Small apps, single data center
PostgreSQL (replicated)	CP-leaning	Banking, e-commerce (consistency critical)
MongoDB	CP-leaning	Financial data, inventory
Cassandra	AP	Analytics, social media, logging
DynamoDB	AP	Shopping cart, session storage
Redis	CP (with sentinel)	Cache, real-time data
CouchDB	AP	Mobile apps, offline-first apps

How Systems Handle Partitions

CP System: Refuse Requests

MongoDB:

try:
    db.users.insert_one({"name": "John"})
except pymongo.errors.NotMasterError:
    print("Cannot write: primary unavailable")
    # Show error to user ❌

User experience: "Service temporarily unavailable"

AP System: Accept Requests, Resolve Later

Cassandra:

# Write succeeds even during partition
session.execute("INSERT INTO users (id, name) VALUES (1, 'John')")

# Read might return stale data ⚠️
result = session.execute("SELECT * FROM users WHERE id = 1")
print(result)  # Might not include latest write yet

User experience: Seems to work, but data might be stale temporarily

Eventual Consistency

Most AP systems use eventual consistency:

Definition: Given enough time (and no new writes), all nodes will converge to the same value

Example:

Write at t=0: "status = online" (Node 1)
Read at t=50ms: "status = offline" (Node 2, old value) ⚠️
Read at t=200ms: "status = online" (Node 2, updated) ✅

Time to consistency:

Cassandra: 100-500ms
DynamoDB: 1-2 seconds (default)
Couchbase: Configurable

Trade-Off: Tunable Consistency

Some databases let you choose per-query:

Cassandra Consistency Levels

Write consistency:

# Require acknowledgment from ALL nodes (strong consistency)
session.execute(query, consistency_level=ConsistencyLevel.ALL)  # CP-like

# Require acknowledgment from 1 node (high availability)
session.execute(query, consistency_level=ConsistencyLevel.ONE)  # AP-like

# Require quorum (majority of nodes)
session.execute(query, consistency_level=ConsistencyLevel.QUORUM)  # Balanced

Trade-off:

ALL: Slow, consistent ✅, unavailable during partition ❌
ONE: Fast ✅, available ✅, inconsistent ⚠️
QUORUM: Balanced

System Design Interview Tips

Common Question: "Explain CAP theorem"

Good answer:

"CAP theorem states that in a distributed system, you can only guarantee 2 of 3:
  - Consistency: All nodes see same data
  - Availability: System always responds
  - Partition tolerance: Works despite network failures

Since network failures are inevitable, the real choice is CP vs AP.

CP systems (MongoDB, HBase): Prioritize consistency, sacrifice availability during partitions
AP systems (Cassandra, DynamoDB): Prioritize availability, accept eventual consistency

Choice depends on use case:
  - Banking: CP (consistency critical)
  - Social media: AP (availability critical)"

Follow-Up: "Design a shopping cart"

Answer:

"I'd choose AP (like Amazon's DynamoDB):

Reason:
  - Availability is critical (users must add to cart, even during partition)
  - Slight inconsistency acceptable (if user adds item on phone and tablet simultaneously, merge both)
  - Losing a sale due to "service unavailable" is worse than showing extra item

Trade-off:
  - Possible duplicate items in cart (rare, user can remove)
  - Revenue > perfect consistency"

✅ Network partitions are inevitable (not optional) ✅ CP vs AP depends on use case ✅ Give specific database examples (MongoDB = CP, Cassandra = AP) ✅ Explain trade-offs (consistency vs availability) ✅ Mention eventual consistency for AP systems

Avoid These Mistakes

❌ Saying "We'll use CA" (not realistic for distributed systems) ❌ Not explaining the trade-off (why CP or AP for your use case?) ❌ Claiming "NoSQL is faster" (it's about consistency model, not speed) ❌ Ignoring partition scenarios (always discuss what happens during network failure)

Beyond CAP: PACELC Theorem

CAP is simplified. Reality is more nuanced:

PACELC Theorem:

If Partition (P): Choose Availability (A) or Consistency (C)
Else (E): Choose Latency (L) or Consistency (C)

Meaning: Even without partitions, there's a trade-off between latency and consistency.

Example:

Strong consistency:
  - Write to Node 1
  - Wait for Node 2, Node 3 to acknowledge
  - High latency (100ms) ⚠️

Eventual consistency:
  - Write to Node 1
  - Return immediately
  - Low latency (10ms) ✅
  - But temporary inconsistency

Conclusion

CAP Theorem is fundamental to distributed systems:

Consistency: All nodes see same data
Availability: System always responds
Partition Tolerance: Works despite network failures

The real choice: CP vs AP

CP (Consistency + Partition Tolerance):

Sacrifice availability during partitions
Use for: Banking, payments, inventory
Examples: MongoDB, PostgreSQL (replicated), Redis

AP (Availability + Partition Tolerance):

Sacrifice consistency (eventual consistency)
Use for: Social media, shopping cart, analytics
Examples: Cassandra, DynamoDB, CouchDB

The secret? There's no "best" choice. Only trade-offs based on your requirements.

Banking system? Choose CP (money can't disappear). Social media? Choose AP (users must always be able to post).

Master CAP theorem, and you'll make better database choices—and ace system design interviews.

CAP Theorem Explained Simply: Consistency, Availability, Partition Tolerance

The Impossible Triangle: Pick Two, You Can't Have Three

What Is CAP Theorem?

The Three Properties

1. Consistency (C)

2. Availability (A)

3. Partition Tolerance (P)

The Real Choice: CP vs AP

CP Systems (Consistency + Partition Tolerance)

CP Databases

1. MongoDB (CP-leaning)

2. HBase (CP)

3. Redis (with sentinel, CP-leaning)

Real-World Example: Bank Transfer (CP System)

AP Systems (Availability + Partition Tolerance)

AP Databases

1. Cassandra (AP)

2. DynamoDB (AP)

3. CouchDB (AP)

CA Systems (Consistency + Availability)

CAP Theorem in Practice: Database Choices

How Systems Handle Partitions

CP System: Refuse Requests

AP System: Accept Requests, Resolve Later

Eventual Consistency

Trade-Off: Tunable Consistency

Cassandra Consistency Levels

System Design Interview Tips

Common Question: "Explain CAP theorem"

Follow-Up: "Design a shopping cart"

What to Mention

Avoid These Mistakes

Beyond CAP: PACELC Theorem

Conclusion

Support My Work

Related Blogs