Ojaswi Athghara | Caching Strategies in System Design: Complete Guide with Examples

Caching Strategies in System Design: Complete Guide with Examples

Facebook Loads in 1.5 Seconds. Without Cache, It Would Take 30 Seconds.

That's 20x faster. And it's all because of caching.

Every time you open Facebook, you're not waiting for servers to fetch your profile, friends list, photos, and timeline posts from the database. Most of that data is already cached—sitting in memory, ready to be served in milliseconds instead of seconds.

The numbers are staggering:

Facebook serves over 1 billion requests per second
90%+ are served from cache, not the database
Without caching, they'd need 10x more database servers (millions in infrastructure costs)

Caching isn't just an optimization. For large-scale systems, it's the difference between functioning and collapsing.

In this guide, I'll break down caching strategies, eviction policies, cache invalidation, and real-world patterns used by companies like Netflix, Twitter, and YouTube. Let's dive in.

What Is Caching?

Caching = Storing frequently accessed data in a fast-access layer to avoid expensive operations.

Think of it like this:

Without cache: Every time you need milk, you drive to the grocery store (slow, expensive)
With cache: Keep milk in your fridge (fast, cheap)

Why Cache?

Problem: Databases are slow

Disk I/O: 5-10ms per query
Complex joins: 50-200ms
At scale: Database becomes bottleneck

Solution: Cache layer

Memory access: < 1ms
Pre-computed results
10-100x faster

Where to Cache?

Browser → CDN → Load Balancer → App Server → Cache → Database
   ↓        ↓                         ↓          ↓
(Cache)  (Cache)                  (Cache)   (Cache)

Caching happens at multiple layers. Let's explore each.

Types of Caching

1. Browser Cache

Where: User's browser stores resources

What it caches:

Images, CSS, JavaScript
Static assets
API responses (sometimes)

Real Example: YouTube

When you visit YouTube:

First visit:
  - Logo image: Downloaded (300ms)
  - CSS file: Downloaded (200ms)
  - JS bundle: Downloaded (500ms)
  Total: 1000ms

Second visit (cached):
  - Logo image: From cache (5ms)
  - CSS file: From cache (5ms)
  - JS bundle: From cache (10ms)
  Total: 20ms

50x faster!

How to implement:

// Server sends cache headers
res.setHeader('Cache-Control', 'public, max-age=31536000'); // Cache for 1 year
res.setHeader('ETag', 'v1.2.3'); // Version tag

Browser automatically caches based on these headers.

2. CDN (Content Delivery Network)

Where: Edge servers distributed globally

What it caches:

Static files (images, videos, CSS, JS)
API responses (sometimes)

Real Example: Netflix

Netflix serves over 3 billion hours of video per month. Storing all that video centrally would be a disaster.

Without CDN:

User in Tokyo → Request video from US data center
    ↓
200ms latency + buffering
    ↓
Poor user experience

With CDN:

User in Tokyo → Nearest edge server (Tokyo)
    ↓
20ms latency + cached video
    ↓
Smooth streaming

Netflix CDN Strategy:

Popular shows: Cached on edge servers worldwide
Less popular content: Fetched on-demand, then cached
Result: 90%+ of traffic served from edge (not origin)

Popular CDN providers:

Cloudflare (used by millions of websites)
AWS CloudFront
Akamai
Fastly

3. Application-Level Cache

Where: In-memory cache in your application (Redis, Memcached)

This is what most system design questions focus on.

Real Example: Twitter Timeline

When you open Twitter:

Your timeline shows tweets from people you follow
Computing this in real-time: Query 1,000 followed users, fetch latest tweets, sort, rank
Expensive operation (100-500ms)

Solution: Cache the timeline

# Pseudo-code
def get_timeline(user_id):
    # Check cache first
    timeline = cache.get(f"timeline:{user_id}")
    
    if timeline:
        return timeline  # Cache hit (< 1ms)
    
    # Cache miss: compute timeline
    timeline = expensive_database_query(user_id)  # 200ms
    
    # Store in cache for 5 minutes
    cache.set(f"timeline:{user_id}", timeline, ttl=300)
    
    return timeline

Result:

First request: 200ms (cache miss)
Subsequent requests: < 1ms (cache hit)
200x faster!

4. Database Cache

Where: Database query results cached

Most modern databases have built-in caching:

MySQL: Query cache
PostgreSQL: Shared buffers
MongoDB: WiredTiger cache

But for high-traffic applications, application-level caching (Redis) is usually better.

Caching Patterns

1. Cache-Aside (Lazy Loading)

Most common pattern

Flow:

1. Application checks cache
2. Cache miss → Query database
3. Store result in cache
4. Return data

Code Example:

def get_user(user_id):
    # Check cache
    user = cache.get(f"user:{user_id}")
    
    if user:
        return user  # Cache hit
    
    # Cache miss: query database
    user = db.query("SELECT * FROM users WHERE id = ?", user_id)
    
    # Store in cache (TTL = 1 hour)
    cache.set(f"user:{user_id}", user, ttl=3600)
    
    return user

Pros: ✅ Simple to implement ✅ Only requested data is cached (efficient) ✅ Cache failure doesn't bring down the app

Cons: ❌ Cache miss penalty (first request is slow) ❌ Stale data possible

When to use: Most read-heavy applications (Facebook profiles, product details, etc.)

2. Write-Through Cache

Flow:

1. Application writes to cache AND database simultaneously
2. Cache is always in sync with database

Code Example:

def update_user(user_id, data):
    # Update database
    db.query("UPDATE users SET ... WHERE id = ?", user_id)
    
    # Update cache immediately
    cache.set(f"user:{user_id}", data, ttl=3600)
    
    return data

Pros: ✅ Cache is never stale ✅ Read performance is consistent

Cons: ❌ Write latency (write to cache + database) ❌ Cache might store unused data

When to use: Banking, financial applications where consistency matters

3. Write-Back (Write-Behind) Cache

Flow:

1. Application writes to cache only
2. Cache asynchronously writes to database later
3. Super fast writes

Code Example:

def update_user(user_id, data):
    # Write to cache immediately
    cache.set(f"user:{user_id}", data, ttl=3600)
    
    # Mark for async database write
    queue.enqueue("db_write", {"user_id": user_id, "data": data})
    
    return data  # Return immediately

# Background worker
def process_db_writes():
    while True:
        task = queue.dequeue("db_write")
        db.query("UPDATE users SET ... WHERE id = ?", task["user_id"])

Pros: ✅ Super fast writes ✅ Reduces database load

Cons: ❌ Risk of data loss (cache failure before DB write) ❌ Complex to implement

When to use: High-write scenarios (analytics, logging, social media likes/views counters)

Real Example: YouTube view counter

View count updated in cache immediately (fast)
Database updated in batches every few minutes
Result: Can handle billions of views without database overload

4. Read-Through Cache

Flow:

1. Application requests from cache
2. Cache automatically fetches from database if miss
3. Application doesn't manage cache logic

Pros: ✅ Simplified application logic ✅ Cache abstraction

Cons: ❌ Requires cache infrastructure that supports it ❌ Less control

When to use: Enterprise caching solutions (Hazelcast, AWS ElastiCache with read-through enabled)

Cache Eviction Policies

Problem: Cache has limited memory. What happens when it's full?

Answer: Evict (remove) old data using an eviction policy.

1. LRU (Least Recently Used)

Rule: Remove data that hasn't been accessed in the longest time

Example:

Cache capacity: 3 items

Access: A → B → C → A → D
     ↓
Cache: [A, B, C] (full)
     ↓
Access D → Evict B (least recently used)
     ↓
Cache: [A, C, D]

Implementation (Python):

from collections import OrderedDict

class LRUCache:
    def __init__(self, capacity):
        self.cache = OrderedDict()
        self.capacity = capacity
    
    def get(self, key):
        if key not in self.cache:
            return None
        
        # Move to end (most recently used)
        self.cache.move_to_end(key)
        return self.cache[key]
    
    def put(self, key, value):
        if key in self.cache:
            self.cache.move_to_end(key)
        
        self.cache[key] = value
        
        if len(self.cache) > self.capacity:
            # Remove first item (least recently used)
            self.cache.popitem(last=False)

When to use: Most common policy. Works well for general-purpose caching.

Real Example: Redis (default eviction policy: allkeys-lru)

2. LFU (Least Frequently Used)

Rule: Remove data that has been accessed the fewest times

Example:

Cache capacity: 3 items

Access: A(3x) → B(1x) → C(2x) → D(1x)
     ↓
Cache: [A, B, C] (full)
     ↓
Access D → Evict B (least frequently used)
     ↓
Cache: [A, C, D]

When to use: When some data is consistently popular (trending videos, hot products)

Real Example: YouTube caching

Popular videos (millions of views): Stay in cache
One-time viewed videos: Evicted quickly

3. FIFO (First In, First Out)

Rule: Remove oldest data first (like a queue)

Simple but not efficient for most use cases.

4. TTL (Time To Live)

Rule: Each item has expiration time. Remove when expired.

Example:

cache.set("user:123", data, ttl=3600)  # Expires in 1 hour

After 1 hour, key automatically removed.

When to use: Combined with other policies. Almost always set TTL to prevent stale data.

Cache Invalidation

The hardest problem in computer science:

"There are only two hard things in Computer Science: cache invalidation and naming things." - Phil Karlton

The Problem

Database: user name = "John"
Cache: user name = "John"
     ↓
User updates name to "Jane"
     ↓
Database: user name = "Jane" ✅
Cache: user name = "John" ❌  (STALE!)
     ↓
Users see old name until cache expires

Solutions

1. TTL-Based Invalidation

Strategy: Set expiration time on cached data

cache.set("user:123", user, ttl=300)  # Expires in 5 minutes

Pros: ✅ Simple ✅ Automatic cleanup

Cons: ❌ Data can be stale for up to TTL duration ❌ Cache misses after expiration (performance hit)

When to use: Data that changes infrequently (product catalog, user profiles)

2. Manual Invalidation

Strategy: Explicitly delete cache when data changes

def update_user(user_id, data):
    # Update database
    db.query("UPDATE users SET ... WHERE id = ?", user_id)
    
    # Invalidate cache
    cache.delete(f"user:{user_id}")

Pros: ✅ Always consistent ✅ No stale data

Cons: ❌ Requires discipline (easy to forget) ❌ Next request is slow (cache miss)

3. Event-Based Invalidation

Strategy: Database triggers cache invalidation

# When user is updated, publish event
def update_user(user_id, data):
    db.query("UPDATE users SET ... WHERE id = ?", user_id)
    
    # Publish event
    event_bus.publish("user.updated", {"user_id": user_id})

# Cache service listens to events
def on_user_updated(event):
    cache.delete(f"user:{event['user_id']}")

Pros: ✅ Decoupled architecture ✅ Scalable

Cons: ❌ Complex ❌ Eventual consistency (slight delay)

When to use: Microservices architecture

Redis: The Most Popular Cache

Redis = Remote Dictionary Server

Why Redis?

Fast: All data in memory
Rich data structures: Strings, lists, sets, sorted sets, hashes
Persistence: Can save to disk
High availability: Redis Cluster, replication

Basic Redis Operations

import redis

# Connect to Redis
cache = redis.Redis(host='localhost', port=6379)

# Set value
cache.set("user:123", "John Doe")

# Get value
name = cache.get("user:123")  # b'John Doe'

# Set with TTL
cache.setex("session:abc", 3600, "user_data")  # Expires in 1 hour

# Delete
cache.delete("user:123")

# Check if exists
exists = cache.exists("user:123")  # 0 (False) or 1 (True)

Advanced Redis Patterns

1. Caching Database Queries

import json

def get_products(category):
    cache_key = f"products:{category}"
    
    # Check cache
    cached = cache.get(cache_key)
    if cached:
        return json.loads(cached)
    
    # Query database
    products = db.query("SELECT * FROM products WHERE category = ?", category)
    
    # Cache for 10 minutes
    cache.setex(cache_key, 600, json.dumps(products))
    
    return products

2. Rate Limiting

def check_rate_limit(user_id):
    key = f"rate_limit:{user_id}"
    
    # Increment counter
    requests = cache.incr(key)
    
    if requests == 1:
        # First request: set expiration (1 minute window)
        cache.expire(key, 60)
    
    if requests > 100:
        return False  # Rate limit exceeded
    
    return True  # Allow request

Real Example: Twitter API rate limiting (300 requests per 15 minutes)

3. Session Storage

def create_session(user_id):
    session_id = generate_random_id()
    
    # Store session data in Redis
    cache.setex(
        f"session:{session_id}",
        86400,  # 24 hours
        json.dumps({"user_id": user_id, "created_at": time.time()})
    )
    
    return session_id

def get_session(session_id):
    session = cache.get(f"session:{session_id}")
    return json.loads(session) if session else None

Why Redis for sessions?

Fast access
Automatic expiration (TTL)
Shared across multiple servers (horizontal scaling)

Real-World Caching Examples

Instagram: Feed Generation

Problem: Generate personalized feed for 2 billion users

Solution:

1. Pre-compute feed for active users
2. Store in Redis cache
3. When user opens app:
   - Fetch from cache (< 10ms)
   - If cache miss: Generate on-demand (500ms)
4. Update cache when:
   - New post from followed user
   - User likes/comments

Result: Sub-second feed loading

Amazon: Product Recommendations

Problem: Recommend products based on browsing history

Solution:

1. Cache "frequently bought together" for each product
2. Cache "customers who bought X also bought Y"
3. TTL: 1 hour (recommendations don't need real-time updates)

Result: Instant recommendations without complex database queries

Gmail: Email List

Problem: Fetch email list for millions of users

Solution:

1. Cache first 50 emails for each user
2. TTL: 5 minutes
3. On new email arrival:
   - Invalidate cache
   - Pre-compute new list
   - Update cache

Result: Instant inbox loading

System Design Interview Tips

Common Questions

Q: "How would you design Twitter?"

Answer (caching strategy):

1. Cache user timeline (Redis)
   - Key: "timeline:{user_id}"
   - Value: List of tweet IDs
   - TTL: 5 minutes

2. Cache tweet details (Redis)
   - Key: "tweet:{tweet_id}"
   - Value: Tweet object
   - TTL: 1 hour (tweets rarely change)

3. Cache user profiles (Redis)
   - Key: "user:{user_id}"
   - Value: Profile object
   - TTL: 30 minutes

4. CDN for images/videos
   - Edge caching
   - Long TTL (immutable content)

Things to Mention

✅ Multiple cache layers (Browser, CDN, Application, Database) ✅ Eviction policy (Usually LRU) ✅ Cache invalidation (TTL + manual invalidation on updates) ✅ Cache-aside pattern (Most common) ✅ Monitoring (Cache hit rate, latency)

Avoid These Mistakes

❌ Not mentioning cache invalidation (shows lack of depth) ❌ Caching everything (cache only hot data) ❌ Ignoring consistency (mention eventual consistency trade-offs) ❌ Not considering cache failures (what if Redis goes down?)

Conclusion

Caching is the secret weapon of scalable systems:

Browser cache: Instant asset loading
CDN: Global low-latency delivery
Application cache (Redis): Fast data access
Database cache: Reduced query load

Key takeaways:

Cache at multiple layers
Use cache-aside pattern for most applications
LRU eviction policy is default (and usually best)
Cache invalidation is hard—combine TTL + manual invalidation
Monitor cache hit rate (aim for 80-90%+)

Remember: Without caching, Facebook would take 30 seconds to load. With caching, it takes 1.5 seconds.

That's the power of caching. Master it, and you'll ace system design interviews—and build systems that actually scale.

Support My Work

Related Blogs