CAP Theorem Explained Simply: Consistency, Availability, Partition Tolerance

Understand CAP theorem for system design interviews. Learn consistency, availability, partition tolerance with real examples from DynamoDB, MongoDB, PostgreSQL, and practical trade-offs

πŸ“… Published: February 14, 2025 ✏️ Updated: March 22, 2025 By Ojaswi Athghara
#cap-theorem #distributed-systems #consistency #availability #system-design

CAP Theorem Explained Simply: Consistency, Availability, Partition Tolerance

The Impossible Triangle: Pick Two, You Can't Have Three

You want your distributed database to be:

  1. Consistent (all users see the same data)
  2. Available (system always responds to requests)
  3. Partition Tolerant (works even when network fails)

CAP Theorem says: You can only pick 2 out of 3.

It's like this triangleβ€”you can't have all three corners simultaneously:

        Consistency
             /\
            /  \
           /    \
          /  CA  \
         /        \
        /----------\
Availability     Partition
                 Tolerance

This isn't a theoryβ€”it's a fundamental law of distributed systems, proven mathematically in 2002 by Eric Brewer.

Instagram chooses CP (consistency over availability during network partitions). Amazon chooses AP (availability over consistencyβ€”you might see stale product info, but checkout always works).

There's no right answer. Only trade-offs.

In this guide, I'll explain CAP theorem simply, show real-world examples, and help you understand which databases choose whatβ€”and why. Let's dive in.


What Is CAP Theorem?

CAP Theorem = In a distributed system, you can guarantee at most 2 of 3 properties:

  • Consistency (C)
  • Availability (A)
  • Partition Tolerance (P)

Proven by: Eric Brewer (2000), formally proven by Seth Gilbert and Nancy Lynch (2002)


The Three Properties

1. Consistency (C)

Definition: All nodes see the same data at the same time

In practice: After a write, all subsequent reads return the updated value

Example:

Consistent system:

User updates profile picture
    ↓
Write to Database Node 1
    ↓
Synchronously replicate to Node 2, Node 3
    ↓
Once all nodes updated: Return success
    ↓
Next read from any node: Returns new picture βœ…

Non-consistent (eventual consistency):

User updates profile picture
    ↓
Write to Database Node 1
    ↓
Asynchronously replicate to Node 2, Node 3 (takes 100ms)
    ↓
Return success immediately
    ↓
Read from Node 2 (before replication): Returns old picture ❌
Wait 100ms, read again: Returns new picture βœ…

Real Example: Banking

You transfer $100 from Account A to Account B:

Consistent:
  Account A: $500 β†’ $400 (immediate)
  Account B: $200 β†’ $300 (immediate)
  Any ATM, any location: Shows $400 and $300 βœ…

Eventual consistency:
  Account A: $500 β†’ $400 (immediate)
  Account B: $200 β†’ $300 (after 1 second)
  ATM on East Coast: Shows $400 and $200 (inconsistent!) ❌
  1 second later: Shows $400 and $300 βœ…

For banking, consistency is mandatory.


2. Availability (A)

Definition: Every request receives a response (success or failure), without guarantee it's the most recent data

In practice: System is always up, always responding (even if data is stale)

Example:

Available system:

User requests profile
    ↓
Server 1 is down ❌
    ↓
Request routed to Server 2
    ↓
Server 2 responds (maybe with cached/stale data)
    ↓
User gets a response βœ… (even if not perfectly up-to-date)

Unavailable system:

User requests profile
    ↓
Server 1 is down ❌
    ↓
System requires Server 1 for consistency
    ↓
Return error: "Service unavailable" ❌

Real Example: Amazon Shopping

Black Friday, millions of users:

Available system:
  - Show product info (even if stock count slightly stale)
  - Allow adding to cart
  - Process checkout
  - Better to show "possibly 5 left" than "Site down"

Unavailable system:
  - Wait for perfect stock count from all warehouses
  - If one warehouse offline: Show "Service unavailable"
  - Lose sales ❌

Amazon chooses availability (you might see stale stock, but site works).


3. Partition Tolerance (P)

Definition: System continues to operate despite network failures (network partition)

Network partition = Servers can't communicate (due to network failure, router issue, cable cut)

Example:

Partition-tolerant system:

        [Node 1 - US East]
              ⚑ Network partition! ⚑
        [Node 2 - US West]

Partition occurs (network failure)
    ↓
Node 1 and Node 2 can't communicate
    ↓
Both nodes continue accepting requests independently βœ…
    ↓
When network heals: Reconcile differences

Not partition-tolerant:

        [Node 1 - US East]
              ⚑ Network partition! ⚑
        [Node 2 - US West]

Partition occurs
    ↓
System stops accepting writes (to maintain consistency)
    ↓
Users see errors ❌

Reality: In distributed systems, network failures are inevitable.

  • Cables get cut
  • Routers fail
  • Data centers lose connectivity
  • Cloud providers have outages

Therefore: Partition tolerance is NOT optional. You must handle it.


The Real Choice: CP vs AP

Since partition tolerance is mandatory (networks fail), the real choice is:

CP (Consistency + Partition Tolerance) vs AP (Availability + Partition Tolerance)


CP Systems (Consistency + Partition Tolerance)

Trade-off: Sacrifice availability during network partitions

During partition:

Network partition occurs
    ↓
System cannot guarantee consistency across nodes
    ↓
Reject writes (or reads) until partition heals
    ↓
Users see errors ❌ (temporarily unavailable)

Once partition heals:

Nodes reconnect
    ↓
Data consistent across all nodes βœ…
    ↓
System resumes normal operation

CP Databases

1. MongoDB (CP-leaning)

How it works:

MongoDB Replica Set:
  - 1 Primary (handles writes)
  - 2 Secondaries (replicas)

Network partition:
  - Primary becomes isolated (can't reach majority)
  - Primary steps down (no longer accepts writes) ❌
  - Secondaries elect new primary
  - Writes go to new primary βœ…

Result:

  • Consistency: All reads return latest data
  • Availability: During partition (10-30 seconds), writes are unavailable ❌

When to use: When consistency is critical (financial data, inventory)


2. HBase (CP)

Used by: Facebook Messages

During partition:

  • Region server loses connection to master
  • Stops serving requests until reconnected
  • Ensures consistency

3. Redis (with sentinel, CP-leaning)

How it works:

Network partition:
  - Master isolated
  - Sentinels detect failure (quorum vote)
  - Promote slave to master
  - During failover (10-30 seconds): Writes unavailable ❌

Real-World Example: Bank Transfer (CP System)

Scenario: Transfer $100 from Account A to Account B

CP System (PostgreSQL with synchronous replication):

Step 1: Deduct $100 from Account A
Step 2: Add $100 to Account B
Step 3: Commit transaction (only if both nodes acknowledge)

Network partition occurs during Step 2:
  - Node 2 unreachable
  - Transaction cannot complete
  - Rollback (Account A unchanged)
  - User sees error: "Transaction failed, try again" ❌

Result:
  - Consistent: Money not lost βœ…
  - Not available: Transaction couldn't complete ❌

Better than: Money disappearing!


AP Systems (Availability + Partition Tolerance)

Trade-off: Sacrifice consistency during network partitions

During partition:

Network partition occurs
    ↓
Nodes accept writes independently (can't sync)
    ↓
Data diverges (different values on different nodes) ⚠️
    ↓
Users get responses βœ… (but possibly stale data)

Once partition heals:

Nodes reconnect
    ↓
Reconcile differences (conflict resolution)
    ↓
Eventually consistent βœ…

AP Databases

1. Cassandra (AP)

How it works:

Cassandra Cluster: 10 nodes

Network partition:
  - Node 1-5 in US East (can't reach US West)
  - Node 6-10 in US West (can't reach US East)
  - Both sides continue accepting writes βœ…

User A writes to US East: "status = online"
User B writes to US West: "status = offline"

Result: Conflicting data ⚠️

When partition heals:
  - Cassandra uses "last write wins" (timestamp-based)
  - Conflict resolved

Result:

  • Available: Always accepting requests βœ…
  • Consistent: During partition, data diverges ❌ (eventually consistent)

When to use: When availability is critical (social media, analytics, logging)


2. DynamoDB (AP)

Used by: Amazon shopping cart

Why AP?

Black Friday: 100M users shopping
Network partition occurs (US East ↔ US West)

AP system:
  - Both regions continue working βœ…
  - User in US East adds item to cart β†’ Written to US East
  - User in US West removes item β†’ Written to US West
  - Conflicting cart states ⚠️

When partition heals:
  - Merge carts (union of items)
  - Better to show extra item than lose sale βœ…

Amazon's philosophy: Availability = revenue. Slight inconsistency acceptable.


3. CouchDB (AP)

How it works:

  • Multi-master replication
  • All nodes accept writes
  • Conflicts resolved via versioning

Real-World Example: Social Media (AP System)

Scenario: User posts a status update

AP System (Facebook):

User posts: "Just got engaged!"
    ↓
Write to Database Node 1 (US East)
    ↓
Asynchronously replicate to Node 2 (US West)
    ↓
Immediately show post to user βœ…

Network partition occurs:
    ↓
Replication delayed (500ms)
    ↓
Friend in US West views profile: Doesn't see post yet ⚠️
    ↓
500ms later: Post appears βœ…

Result:
  - Available: User could post βœ…
  - Consistent: Brief delay for global sync ⚠️ (eventual consistency)

Acceptable trade-off for social media. No one dies if they see a post 500ms late.


CA Systems (Consistency + Availability)

Wait, what about CA (Consistency + Availability without Partition Tolerance)?

Theoretical: Single-node databases (no network = no partitions)

Examples:

  • PostgreSQL (single instance)
  • MySQL (single instance)

Reality: Not realistic for large-scale systems

  • Single node = single point of failure
  • Can't scale horizontally
  • No geographic distribution

Used for: Small applications, development environments


CAP Theorem in Practice: Database Choices

DatabaseCAP TypeReal Use Case
PostgreSQL (single)CASmall apps, single data center
PostgreSQL (replicated)CP-leaningBanking, e-commerce (consistency critical)
MongoDBCP-leaningFinancial data, inventory
CassandraAPAnalytics, social media, logging
DynamoDBAPShopping cart, session storage
RedisCP (with sentinel)Cache, real-time data
CouchDBAPMobile apps, offline-first apps

How Systems Handle Partitions

CP System: Refuse Requests

MongoDB:

try:
    db.users.insert_one({"name": "John"})
except pymongo.errors.NotMasterError:
    print("Cannot write: primary unavailable")
    # Show error to user ❌

User experience: "Service temporarily unavailable"


AP System: Accept Requests, Resolve Later

Cassandra:

# Write succeeds even during partition
session.execute("INSERT INTO users (id, name) VALUES (1, 'John')")

# Read might return stale data ⚠️
result = session.execute("SELECT * FROM users WHERE id = 1")
print(result)  # Might not include latest write yet

User experience: Seems to work, but data might be stale temporarily


Eventual Consistency

Most AP systems use eventual consistency:

Definition: Given enough time (and no new writes), all nodes will converge to the same value

Example:

Write at t=0: "status = online" (Node 1)
Read at t=50ms: "status = offline" (Node 2, old value) ⚠️
Read at t=200ms: "status = online" (Node 2, updated) βœ…

Time to consistency:

  • Cassandra: 100-500ms
  • DynamoDB: 1-2 seconds (default)
  • Couchbase: Configurable

Trade-Off: Tunable Consistency

Some databases let you choose per-query:

Cassandra Consistency Levels

Write consistency:

# Require acknowledgment from ALL nodes (strong consistency)
session.execute(query, consistency_level=ConsistencyLevel.ALL)  # CP-like

# Require acknowledgment from 1 node (high availability)
session.execute(query, consistency_level=ConsistencyLevel.ONE)  # AP-like

# Require quorum (majority of nodes)
session.execute(query, consistency_level=ConsistencyLevel.QUORUM)  # Balanced

Trade-off:

  • ALL: Slow, consistent βœ…, unavailable during partition ❌
  • ONE: Fast βœ…, available βœ…, inconsistent ⚠️
  • QUORUM: Balanced

System Design Interview Tips

Common Question: "Explain CAP theorem"

Good answer:

"CAP theorem states that in a distributed system, you can only guarantee 2 of 3:
  - Consistency: All nodes see same data
  - Availability: System always responds
  - Partition tolerance: Works despite network failures

Since network failures are inevitable, the real choice is CP vs AP.

CP systems (MongoDB, HBase): Prioritize consistency, sacrifice availability during partitions
AP systems (Cassandra, DynamoDB): Prioritize availability, accept eventual consistency

Choice depends on use case:
  - Banking: CP (consistency critical)
  - Social media: AP (availability critical)"

Follow-Up: "Design a shopping cart"

Answer:

"I'd choose AP (like Amazon's DynamoDB):

Reason:
  - Availability is critical (users must add to cart, even during partition)
  - Slight inconsistency acceptable (if user adds item on phone and tablet simultaneously, merge both)
  - Losing a sale due to "service unavailable" is worse than showing extra item

Trade-off:
  - Possible duplicate items in cart (rare, user can remove)
  - Revenue > perfect consistency"

What to Mention

βœ… Network partitions are inevitable (not optional) βœ… CP vs AP depends on use case βœ… Give specific database examples (MongoDB = CP, Cassandra = AP) βœ… Explain trade-offs (consistency vs availability) βœ… Mention eventual consistency for AP systems


Avoid These Mistakes

❌ Saying "We'll use CA" (not realistic for distributed systems) ❌ Not explaining the trade-off (why CP or AP for your use case?) ❌ Claiming "NoSQL is faster" (it's about consistency model, not speed) ❌ Ignoring partition scenarios (always discuss what happens during network failure)


Beyond CAP: PACELC Theorem

CAP is simplified. Reality is more nuanced:

PACELC Theorem:

  • If Partition (P): Choose Availability (A) or Consistency (C)
  • Else (E): Choose Latency (L) or Consistency (C)

Meaning: Even without partitions, there's a trade-off between latency and consistency.

Example:

Strong consistency:
  - Write to Node 1
  - Wait for Node 2, Node 3 to acknowledge
  - High latency (100ms) ⚠️

Eventual consistency:
  - Write to Node 1
  - Return immediately
  - Low latency (10ms) βœ…
  - But temporary inconsistency

Conclusion

CAP Theorem is fundamental to distributed systems:

  • Consistency: All nodes see same data
  • Availability: System always responds
  • Partition Tolerance: Works despite network failures

The real choice: CP vs AP

CP (Consistency + Partition Tolerance):

  • Sacrifice availability during partitions
  • Use for: Banking, payments, inventory
  • Examples: MongoDB, PostgreSQL (replicated), Redis

AP (Availability + Partition Tolerance):

  • Sacrifice consistency (eventual consistency)
  • Use for: Social media, shopping cart, analytics
  • Examples: Cassandra, DynamoDB, CouchDB

The secret? There's no "best" choice. Only trade-offs based on your requirements.

Banking system? Choose CP (money can't disappear). Social media? Choose AP (users must always be able to post).

Master CAP theorem, and you'll make better database choicesβ€”and ace system design interviews.


Cover image by Yang Deng on Unsplash

Support My Work

If this guide helped you learn something new, solve a problem, or ace your interviews, I'd really appreciate your support! Creating comprehensive, free content like this takes significant time and effort. Your support helps me continue sharing knowledge and creating more helpful resources for developers and students.

Buy me a Coffee

Every contribution, big or small, means the world to me and keeps me motivated to create more content!

Related Blogs

Ojaswi Athghara

SDE, 4+ Years

Β© ojaswiat.com 2025-2027