Database Design in System Design: SQL vs NoSQL, Sharding, Replication

Master database design for system design interviews. Learn SQL vs NoSQL, database sharding, replication, indexing with real examples from Facebook, Twitter, Uber

πŸ“… Published: July 20, 2025 ✏️ Updated: August 14, 2025 By Ojaswi Athghara
#database #sql #nosql #sharding #replication

Database Design in System Design: SQL vs NoSQL, Sharding, Replication

"NoSQL Is Always Faster Than SQL" – The Expensive Myth

I've heard this in dozens of system design interviews: "Let's use MongoDB because it's faster than PostgreSQL."

Wrong.

Instagram serves billions of photosβ€”PostgreSQL. Uber tracks millions of ridesβ€”PostgreSQL. Discord handles billions of messagesβ€”their custom database built on... Cassandra? No, they actually use Scylla, but store critical data in PostgreSQL.

The truth? Database choice isn't about "faster" or "slower." It's about matching your data model, access patterns, and consistency requirements to the right storage system.

In this guide, I'll break down SQL vs NoSQL, when to use each, sharding, replication, and how companies like Facebook, Twitter, and Uber design their databases. Let's dive in.


Types of Databases

1. SQL (Relational) Databases

Examples: PostgreSQL, MySQL, Oracle, SQL Server

Data Model: Tables with rows and columns

Example:

-- Users table
users
+----+----------+------------------+
| id | username | email            |
+----+----------+------------------+
| 1  | john_doe | john@example.com |
| 2  | jane_doe | jane@example.com |
+----+----------+------------------+

-- Posts table
posts
+----+---------+------------------+------------+
| id | user_id | content          | created_at |
+----+---------+------------------+------------+
| 1  | 1       | Hello world!     | 2025-01-01 |
| 2  | 1       | Learning SQL     | 2025-01-02 |
| 3  | 2       | NoSQL vs SQL     | 2025-01-03 |
+----+---------+------------------+------------+

-- Relationships
Foreign key: posts.user_id β†’ users.id

Characteristics: βœ… ACID transactions (Atomicity, Consistency, Isolation, Durability) βœ… Structured schema (enforced data integrity) βœ… Complex queries with JOINs βœ… Mature ecosystem (40+ years old)

When to use:

  • Financial systems (banking, payments)
  • E-commerce (orders, inventory)
  • Applications with complex relationships
  • Need for strong consistency

Real Example: Instagram

Instagram uses PostgreSQL for:

  • User profiles
  • Relationships (followers/following)
  • Posts metadata
  • Comments

Why PostgreSQL?

  • Complex queries (fetch all posts from followed users)
  • Transactional integrity (ensure consistency)
  • Proven scalability (with sharding)

2. NoSQL Databases

"Not only SQL" – Multiple types:

a) Document Stores (MongoDB, Couchbase)

Data Model: JSON-like documents

Example:

{
  "_id": "user_123",
  "username": "john_doe",
  "email": "john@example.com",
  "profile": {
    "bio": "Software developer",
    "location": "San Francisco"
  },
  "posts": [
    {
      "content": "Hello world!",
      "created_at": "2025-01-01",
      "likes": 42
    }
  ]
}

Characteristics: βœ… Flexible schema (no predefined structure) βœ… Nested data (no JOINs needed) βœ… Horizontal scaling (sharding built-in)

When to use:

  • Rapidly changing schema
  • Nested/hierarchical data
  • Content management systems
  • Catalogs with varying attributes

Real Example: Facebook Messages

Facebook uses HBase (similar to document store) for messaging:

  • Billions of messages
  • Schema flexibility (attachments, reactions, etc.)
  • Write-heavy workload

b) Key-Value Stores (Redis, DynamoDB, Riak)

Data Model: Simple key β†’ value

Example:

# Redis
"session:abc123" β†’ {"user_id": 1, "logged_in": true}
"user:1:cart" β†’ ["product_1", "product_2"]
"rate_limit:user_1" β†’ 45

Characteristics: βœ… Extremely fast (in-memory) βœ… Simple operations (GET, SET, DELETE) βœ… TTL support (auto-expiration)

When to use:

  • Caching
  • Session storage
  • Real-time analytics (counters, leaderboards)
  • Rate limiting

Real Example: Twitter

Twitter uses Redis for:

  • Timeline caching
  • User session storage
  • Rate limiting (API calls per user)

c) Column-Family Stores (Cassandra, HBase, ScyllaDB)

Data Model: Rows with dynamic columns

Example:

Row key: user_123
  | Column family: profile
    | name: "John Doe"
    | email: "john@example.com"
  | Column family: posts
    | post_1: "Hello world!"
    | post_2: "Learning Cassandra"

Characteristics: βœ… Write-optimized (append-only) βœ… Massive scalability (petabytes) βœ… Tunable consistency

When to use:

  • Time-series data
  • High write throughput
  • Analytics / logging

Real Example: Netflix

Netflix uses Cassandra for:

  • Viewing history (billions of events)
  • User preferences
  • A/B testing data

Why Cassandra?

  • Handles 1+ million writes per second
  • Globally distributed (multi-region)
  • Always available (no single point of failure)

d) Graph Databases (Neo4j, Amazon Neptune)

Data Model: Nodes and edges (relationships)

Example:

(User: john_doe) -[:FOLLOWS]-> (User: jane_doe)
(User: john_doe) -[:LIKES]-> (Post: 123)
(Post: 123) -[:TAGGED]-> (Tag: #systemdesign)

When to use:

  • Social networks (friends, followers)
  • Recommendation engines
  • Fraud detection

Real Example: LinkedIn

LinkedIn uses graph database concepts for:

  • Connection recommendations ("People you may know")
  • 2nd/3rd degree connections
  • Shortest path between users

SQL vs NoSQL: Decision Matrix

FactorSQLNoSQL
SchemaFixed, predefinedFlexible, dynamic
TransactionsFull ACID βœ…Limited (eventual consistency)
ScalabilityVertical (harder to scale)Horizontal (built-in) βœ…
Complex QueriesJOINs, aggregations βœ…Limited support
ConsistencyStrong βœ…Eventual (tunable)
Maturity40+ years βœ…15 years
Use CaseBanking, e-commerceSocial media, analytics

Truth: Most large systems use both.


Database Scaling Techniques

1. Indexing

Problem: Query slow on large table

-- 10 million users
SELECT * FROM users WHERE email = 'john@example.com';
-- Scans entire table: 5 seconds ❌

Solution: Create index

CREATE INDEX idx_email ON users(email);

-- Now uses index: 5ms βœ…

How indexes work:

Think of a book:

  • Without index: Read every page to find "Chapter 5" (slow)
  • With index: Look at table of contents, jump to page 87 (fast)

Trade-off: βœ… Faster reads ❌ Slower writes (must update index)

Best practice: Index columns used in WHERE, ORDER BY, JOIN

Real Example: Facebook

Facebook heavily indexes:

  • User IDs
  • Email addresses
  • Phone numbers
  • Timestamps (for timeline queries)

2. Replication

Strategy: Copy data to multiple servers

Master-Slave Replication

        [Master DB]
     (Handles writes)
            ↓
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”
    ↓               ↓
[Slave 1]       [Slave 2]
(Handles reads) (Handles reads)

How it works:

  1. All writes β†’ Master
  2. Master replicates changes to Slaves
  3. Reads β†’ Distributed across Slaves

Benefits: βœ… Scale reads (add more slaves) βœ… High availability (if master fails, promote slave) βœ… Backup (slaves have full copy)

Drawbacks: ❌ Replication lag (slaves slightly behind master) ❌ Single write point (master is bottleneck)

Real Example: Reddit

Reddit uses master-slave replication:

  • Master: Handles posts, comments, votes (writes)
  • Slaves (10+): Serve home feed, subreddit pages (reads)

Master-Master Replication

[Master 1] ←→ [Master 2]
  (Writes)      (Writes)

Both databases accept writes, replicate to each other.

Benefits: βœ… No single write bottleneck βœ… High availability

Drawbacks: ❌ Write conflicts (complex to resolve) ❌ Consistency challenges

When to use: Multi-region applications needing low write latency everywhere


3. Sharding (Horizontal Partitioning)

Problem: Database too large for single server

Database: 1 TB
Single server: 256 GB RAM
    ↓
Cannot fit in memory
    ↓
Slow queries

Solution: Split data across multiple databases

Sharding Strategies

a) Range-Based Sharding

Shard 1: Users with ID 1 - 1,000,000
Shard 2: Users with ID 1,000,001 - 2,000,000
Shard 3: Users with ID 2,000,001 - 3,000,000

Pros: βœ… Simple to implement βœ… Easy to add new shard (extend range)

Cons: ❌ Uneven distribution (hotspots) ❌ Shard 1 might have 80% traffic


b) Hash-Based Sharding

def get_shard(user_id, num_shards):
    return hash(user_id) % num_shards

# Examples
user_123 β†’ Shard 1
user_456 β†’ Shard 3
user_789 β†’ Shard 2

Pros: βœ… Even distribution βœ… No hotspots

Cons: ❌ Adding shards requires re-sharding all data ❌ Range queries difficult (data scattered)


c) Geographic Sharding

US users β†’ US database
EU users β†’ EU database
Asia users β†’ Asia database

Pros: βœ… Low latency (data close to users) βœ… Regulatory compliance (GDPR)

Cons: ❌ Uneven load (if user distribution varies) ❌ Cross-region queries expensive

Real Example: Uber

Uber shards by city:

  • New York rides β†’ New York database
  • San Francisco rides β†’ SF database
  • London rides β†’ London database

Why?

  • Rides are geographically bounded
  • Most queries within single city
  • Rare cross-city queries

Sharding Challenges

Problem 1: Joins across shards

-- User in Shard 1, Posts in Shard 2
SELECT users.name, posts.content
FROM users JOIN posts ON users.id = posts.user_id;

-- Requires fetching data from multiple shards ❌

Solution: Denormalize (duplicate data to avoid cross-shard queries)


Problem 2: Auto-increment IDs

-- Shard 1 generates: 1, 2, 3, ...
-- Shard 2 generates: 1, 2, 3, ... (collision!)

Solutions:

  • Use UUID (globally unique, but large)
  • Twitter Snowflake (timestamp + shard ID + sequence)
  • Instagram ID generation (timestamp + shard ID)

Data Storage Types

1. OLTP (Online Transaction Processing)

Purpose: Handle day-to-day transactions

Characteristics:

  • Fast writes/reads
  • Small queries (fetch single user, update order)
  • High concurrency (thousands of queries/sec)

Examples: User profiles, orders, inventory

Databases: MySQL, PostgreSQL, MongoDB

Real Example: Amazon Orders

When you place an order:

BEGIN TRANSACTION;
  INSERT INTO orders (user_id, total) VALUES (123, 49.99);
  UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 789;
  INSERT INTO payments (order_id, status) VALUES (456, 'pending');
COMMIT;

Fast, transactional, consistent.


2. OLAP (Online Analytical Processing)

Purpose: Analyze historical data

Characteristics:

  • Complex queries (aggregations, JOINs)
  • Large datasets (billions of rows)
  • Read-heavy

Examples: Business intelligence, reports, dashboards

Databases: Snowflake, Google BigQuery, Amazon Redshift

Real Example: Netflix Analytics

Netflix analyzes billions of viewing events:

-- "What shows are most popular in India?"
SELECT show_name, COUNT(*) as views
FROM viewing_history
WHERE country = 'India' AND date >= '2025-01-01'
GROUP BY show_name
ORDER BY views DESC
LIMIT 10;

Query might take 30 secondsβ€”but that's okay, it's not real-time.


3. Data Warehouse

Purpose: Central repository for analytics

Architecture:

          [Data Warehouse]
                ↑
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    ↓           ↓           ↓
[OLTP DB 1] [OLTP DB 2] [Logs]
(Products)  (Users)     (Events)

Extract data from multiple sources, transform, load into warehouse (ETL).

Real Example: Airbnb

Airbnb consolidates:

  • Booking data (PostgreSQL)
  • Search logs (Elasticsearch)
  • User behavior (event tracking)

Into data warehouse (Presto) for analytics.


4. Data Lake

Purpose: Store raw, unstructured data

Difference from Data Warehouse:

  • Data Lake: Store everything (unprocessed)
  • Data Warehouse: Store structured, processed data

Real Example: Facebook

Facebook stores:

  • Photos (billions)
  • Videos
  • Logs
  • Everything

In data lake (Hadoop), process later as needed.


CAP Theorem (Briefly)

CAP Theorem: You can only choose 2 out of 3:

  • Consistency: All nodes see same data
  • Availability: Every request gets response
  • Partition Tolerance: System works despite network failures

Choices:

  • CP (Consistency + Partition Tolerance): MongoDB, HBase
  • AP (Availability + Partition Tolerance): Cassandra, DynamoDB
  • CA (Consistency + Availability): Not realistic (networks fail)

Most systems: Choose AP or CP based on needs


System Design Interview Tips

Common Question: "Design Twitter"

Database Strategy:

1. User Profile: PostgreSQL
   - Structured data
   - Relationships (followers)
   - ACID transactions

2. Tweets: Cassandra
   - Massive write volume (6,000 tweets/sec)
   - Time-series data
   - No complex queries

3. Timeline Cache: Redis
   - Pre-computed timelines
   - Fast access
   - TTL = 5 minutes

4. Analytics: Data Warehouse (BigQuery)
   - Aggregate metrics
   - Historical trends

Sharding:

Users: Hash-based sharding by user_id
Tweets: Range-based sharding by timestamp (recent tweets on hot shards)

Replication:

Master-slave replication:
  - 1 master (writes)
  - 5 slaves (reads)
  - Geographic distribution

What to Mention

βœ… SQL for structured data, NoSQL for unstructured/high-scale βœ… Replication for read scaling βœ… Sharding for write scaling βœ… Indexes for query optimization βœ… Cache layer (Redis) for hot data

Avoid These Mistakes

❌ "Let's use NoSQL because it's faster" (no justification) ❌ Not discussing replication/sharding ❌ Ignoring consistency requirements ❌ Not mentioning indexes


Practical Tips

1. Start with SQL

Unless you have a specific reason (massive scale, flexible schema), start with PostgreSQL or MySQL.

Why?

  • Mature, well-understood
  • Rich querying capabilities
  • Most applications fit relational model

Uber, Instagram, and Reddit all started with SQL.


2. Denormalize for Performance

Normalized (relational):

-- Fetch user's tweet with retweet count: 2 queries
SELECT * FROM tweets WHERE id = 123;
SELECT COUNT(*) FROM retweets WHERE tweet_id = 123;

Denormalized (fast):

tweets
+----+---------+----------------+
| id | content | retweet_count  |
+----+---------+----------------+
| 123| Hello   | 42             |
+----+---------+----------------+

-- Fetch in 1 query
SELECT * FROM tweets WHERE id = 123;

Trade-off: Update retweet_count on every retweet (extra write), but reads are faster.


3. Cache Aggressively

Pattern:

1. Check Redis cache
2. If miss: Query database
3. Store result in cache (TTL = 5 min)
4. Return data

Result: 90%+ requests served from cache (< 1ms), 10% from database (100ms)


4. Monitor Query Performance

Use:

  • PostgreSQL: EXPLAIN ANALYZE
  • MySQL: EXPLAIN
  • MongoDB: .explain()

Find slow queries, add indexes.


Conclusion

Database design is the foundation of scalable systems:

  • SQL: Structured data, transactions, complex queries
  • NoSQL: Flexibility, horizontal scaling, high throughput
  • Replication: Scale reads, high availability
  • Sharding: Scale writes, massive data
  • Indexing: Fast queries

Key takeaway: There's no "best" database. Match your data model and access patterns to the right storage system.

Instagram uses PostgreSQL. Facebook uses MySQL, Cassandra, HBase, and more. Twitter uses MySQL, Cassandra, and Redis.

Use the right tool for the job. Master the fundamentals, and you'll ace database design questions in system design interviewsβ€”and build systems that scale to billions of users.


Cover image by Kevin Ache on Unsplash

Support My Work

If this guide helped you learn something new, solve a problem, or ace your interviews, I'd really appreciate your support! Creating comprehensive, free content like this takes significant time and effort. Your support helps me continue sharing knowledge and creating more helpful resources for developers and students.

Buy me a Coffee

Every contribution, big or small, means the world to me and keeps me motivated to create more content!

Related Blogs

Ojaswi Athghara

SDE, 4+ Years

Β© ojaswiat.com 2025-2027