Caching Strategies in System Design: Complete Guide with Examples
Master caching strategies for system design interviews. Learn cache invalidation, eviction policies (LRU, LFU), CDN, Redis patterns with real examples from Facebook, Netflix, YouTube

Facebook Loads in 1.5 Seconds. Without Cache, It Would Take 30 Seconds.
That's 20x faster. And it's all because of caching.
Every time you open Facebook, you're not waiting for servers to fetch your profile, friends list, photos, and timeline posts from the database. Most of that data is already cachedβsitting in memory, ready to be served in milliseconds instead of seconds.
The numbers are staggering:
- Facebook serves over 1 billion requests per second
- 90%+ are served from cache, not the database
- Without caching, they'd need 10x more database servers (millions in infrastructure costs)
Caching isn't just an optimization. For large-scale systems, it's the difference between functioning and collapsing.
In this guide, I'll break down caching strategies, eviction policies, cache invalidation, and real-world patterns used by companies like Netflix, Twitter, and YouTube. Let's dive in.
What Is Caching?
Caching = Storing frequently accessed data in a fast-access layer to avoid expensive operations.
Think of it like this:
- Without cache: Every time you need milk, you drive to the grocery store (slow, expensive)
- With cache: Keep milk in your fridge (fast, cheap)
Why Cache?
Problem: Databases are slow
- Disk I/O: 5-10ms per query
- Complex joins: 50-200ms
- At scale: Database becomes bottleneck
Solution: Cache layer
- Memory access: < 1ms
- Pre-computed results
- 10-100x faster
Where to Cache?
Browser β CDN β Load Balancer β App Server β Cache β Database
β β β β
(Cache) (Cache) (Cache) (Cache)
Caching happens at multiple layers. Let's explore each.
Types of Caching
1. Browser Cache
Where: User's browser stores resources
What it caches:
- Images, CSS, JavaScript
- Static assets
- API responses (sometimes)
Real Example: YouTube
When you visit YouTube:
First visit:
- Logo image: Downloaded (300ms)
- CSS file: Downloaded (200ms)
- JS bundle: Downloaded (500ms)
Total: 1000ms
Second visit (cached):
- Logo image: From cache (5ms)
- CSS file: From cache (5ms)
- JS bundle: From cache (10ms)
Total: 20ms
50x faster!
How to implement:
// Server sends cache headers
res.setHeader('Cache-Control', 'public, max-age=31536000'); // Cache for 1 year
res.setHeader('ETag', 'v1.2.3'); // Version tag
Browser automatically caches based on these headers.
2. CDN (Content Delivery Network)
Where: Edge servers distributed globally
What it caches:
- Static files (images, videos, CSS, JS)
- API responses (sometimes)
Real Example: Netflix
Netflix serves over 3 billion hours of video per month. Storing all that video centrally would be a disaster.
Without CDN:
User in Tokyo β Request video from US data center
β
200ms latency + buffering
β
Poor user experience
With CDN:
User in Tokyo β Nearest edge server (Tokyo)
β
20ms latency + cached video
β
Smooth streaming
Netflix CDN Strategy:
- Popular shows: Cached on edge servers worldwide
- Less popular content: Fetched on-demand, then cached
- Result: 90%+ of traffic served from edge (not origin)
Popular CDN providers:
- Cloudflare (used by millions of websites)
- AWS CloudFront
- Akamai
- Fastly
3. Application-Level Cache
Where: In-memory cache in your application (Redis, Memcached)
This is what most system design questions focus on.
Real Example: Twitter Timeline
When you open Twitter:
- Your timeline shows tweets from people you follow
- Computing this in real-time: Query 1,000 followed users, fetch latest tweets, sort, rank
- Expensive operation (100-500ms)
Solution: Cache the timeline
# Pseudo-code
def get_timeline(user_id):
# Check cache first
timeline = cache.get(f"timeline:{user_id}")
if timeline:
return timeline # Cache hit (< 1ms)
# Cache miss: compute timeline
timeline = expensive_database_query(user_id) # 200ms
# Store in cache for 5 minutes
cache.set(f"timeline:{user_id}", timeline, ttl=300)
return timeline
Result:
- First request: 200ms (cache miss)
- Subsequent requests: < 1ms (cache hit)
- 200x faster!
4. Database Cache
Where: Database query results cached
Most modern databases have built-in caching:
- MySQL: Query cache
- PostgreSQL: Shared buffers
- MongoDB: WiredTiger cache
But for high-traffic applications, application-level caching (Redis) is usually better.
Caching Patterns
1. Cache-Aside (Lazy Loading)
Most common pattern
Flow:
1. Application checks cache
2. Cache miss β Query database
3. Store result in cache
4. Return data
Code Example:
def get_user(user_id):
# Check cache
user = cache.get(f"user:{user_id}")
if user:
return user # Cache hit
# Cache miss: query database
user = db.query("SELECT * FROM users WHERE id = ?", user_id)
# Store in cache (TTL = 1 hour)
cache.set(f"user:{user_id}", user, ttl=3600)
return user
Pros: β Simple to implement β Only requested data is cached (efficient) β Cache failure doesn't bring down the app
Cons: β Cache miss penalty (first request is slow) β Stale data possible
When to use: Most read-heavy applications (Facebook profiles, product details, etc.)
2. Write-Through Cache
Flow:
1. Application writes to cache AND database simultaneously
2. Cache is always in sync with database
Code Example:
def update_user(user_id, data):
# Update database
db.query("UPDATE users SET ... WHERE id = ?", user_id)
# Update cache immediately
cache.set(f"user:{user_id}", data, ttl=3600)
return data
Pros: β Cache is never stale β Read performance is consistent
Cons: β Write latency (write to cache + database) β Cache might store unused data
When to use: Banking, financial applications where consistency matters
3. Write-Back (Write-Behind) Cache
Flow:
1. Application writes to cache only
2. Cache asynchronously writes to database later
3. Super fast writes
Code Example:
def update_user(user_id, data):
# Write to cache immediately
cache.set(f"user:{user_id}", data, ttl=3600)
# Mark for async database write
queue.enqueue("db_write", {"user_id": user_id, "data": data})
return data # Return immediately
# Background worker
def process_db_writes():
while True:
task = queue.dequeue("db_write")
db.query("UPDATE users SET ... WHERE id = ?", task["user_id"])
Pros: β Super fast writes β Reduces database load
Cons: β Risk of data loss (cache failure before DB write) β Complex to implement
When to use: High-write scenarios (analytics, logging, social media likes/views counters)
Real Example: YouTube view counter
- View count updated in cache immediately (fast)
- Database updated in batches every few minutes
- Result: Can handle billions of views without database overload
4. Read-Through Cache
Flow:
1. Application requests from cache
2. Cache automatically fetches from database if miss
3. Application doesn't manage cache logic
Pros: β Simplified application logic β Cache abstraction
Cons: β Requires cache infrastructure that supports it β Less control
When to use: Enterprise caching solutions (Hazelcast, AWS ElastiCache with read-through enabled)
Cache Eviction Policies
Problem: Cache has limited memory. What happens when it's full?
Answer: Evict (remove) old data using an eviction policy.
1. LRU (Least Recently Used)
Rule: Remove data that hasn't been accessed in the longest time
Example:
Cache capacity: 3 items
Access: A β B β C β A β D
β
Cache: [A, B, C] (full)
β
Access D β Evict B (least recently used)
β
Cache: [A, C, D]
Implementation (Python):
from collections import OrderedDict
class LRUCache:
def __init__(self, capacity):
self.cache = OrderedDict()
self.capacity = capacity
def get(self, key):
if key not in self.cache:
return None
# Move to end (most recently used)
self.cache.move_to_end(key)
return self.cache[key]
def put(self, key, value):
if key in self.cache:
self.cache.move_to_end(key)
self.cache[key] = value
if len(self.cache) > self.capacity:
# Remove first item (least recently used)
self.cache.popitem(last=False)
When to use: Most common policy. Works well for general-purpose caching.
Real Example: Redis (default eviction policy: allkeys-lru)
2. LFU (Least Frequently Used)
Rule: Remove data that has been accessed the fewest times
Example:
Cache capacity: 3 items
Access: A(3x) β B(1x) β C(2x) β D(1x)
β
Cache: [A, B, C] (full)
β
Access D β Evict B (least frequently used)
β
Cache: [A, C, D]
When to use: When some data is consistently popular (trending videos, hot products)
Real Example: YouTube caching
- Popular videos (millions of views): Stay in cache
- One-time viewed videos: Evicted quickly
3. FIFO (First In, First Out)
Rule: Remove oldest data first (like a queue)
Simple but not efficient for most use cases.
4. TTL (Time To Live)
Rule: Each item has expiration time. Remove when expired.
Example:
cache.set("user:123", data, ttl=3600) # Expires in 1 hour
After 1 hour, key automatically removed.
When to use: Combined with other policies. Almost always set TTL to prevent stale data.
Cache Invalidation
The hardest problem in computer science:
"There are only two hard things in Computer Science: cache invalidation and naming things." - Phil Karlton
The Problem
Database: user name = "John"
Cache: user name = "John"
β
User updates name to "Jane"
β
Database: user name = "Jane" β
Cache: user name = "John" β (STALE!)
β
Users see old name until cache expires
Solutions
1. TTL-Based Invalidation
Strategy: Set expiration time on cached data
cache.set("user:123", user, ttl=300) # Expires in 5 minutes
Pros: β Simple β Automatic cleanup
Cons: β Data can be stale for up to TTL duration β Cache misses after expiration (performance hit)
When to use: Data that changes infrequently (product catalog, user profiles)
2. Manual Invalidation
Strategy: Explicitly delete cache when data changes
def update_user(user_id, data):
# Update database
db.query("UPDATE users SET ... WHERE id = ?", user_id)
# Invalidate cache
cache.delete(f"user:{user_id}")
Pros: β Always consistent β No stale data
Cons: β Requires discipline (easy to forget) β Next request is slow (cache miss)
3. Event-Based Invalidation
Strategy: Database triggers cache invalidation
# When user is updated, publish event
def update_user(user_id, data):
db.query("UPDATE users SET ... WHERE id = ?", user_id)
# Publish event
event_bus.publish("user.updated", {"user_id": user_id})
# Cache service listens to events
def on_user_updated(event):
cache.delete(f"user:{event['user_id']}")
Pros: β Decoupled architecture β Scalable
Cons: β Complex β Eventual consistency (slight delay)
When to use: Microservices architecture
Redis: The Most Popular Cache
Redis = Remote Dictionary Server
Why Redis?
- Fast: All data in memory
- Rich data structures: Strings, lists, sets, sorted sets, hashes
- Persistence: Can save to disk
- High availability: Redis Cluster, replication
Basic Redis Operations
import redis
# Connect to Redis
cache = redis.Redis(host='localhost', port=6379)
# Set value
cache.set("user:123", "John Doe")
# Get value
name = cache.get("user:123") # b'John Doe'
# Set with TTL
cache.setex("session:abc", 3600, "user_data") # Expires in 1 hour
# Delete
cache.delete("user:123")
# Check if exists
exists = cache.exists("user:123") # 0 (False) or 1 (True)
Advanced Redis Patterns
1. Caching Database Queries
import json
def get_products(category):
cache_key = f"products:{category}"
# Check cache
cached = cache.get(cache_key)
if cached:
return json.loads(cached)
# Query database
products = db.query("SELECT * FROM products WHERE category = ?", category)
# Cache for 10 minutes
cache.setex(cache_key, 600, json.dumps(products))
return products
2. Rate Limiting
def check_rate_limit(user_id):
key = f"rate_limit:{user_id}"
# Increment counter
requests = cache.incr(key)
if requests == 1:
# First request: set expiration (1 minute window)
cache.expire(key, 60)
if requests > 100:
return False # Rate limit exceeded
return True # Allow request
Real Example: Twitter API rate limiting (300 requests per 15 minutes)
3. Session Storage
def create_session(user_id):
session_id = generate_random_id()
# Store session data in Redis
cache.setex(
f"session:{session_id}",
86400, # 24 hours
json.dumps({"user_id": user_id, "created_at": time.time()})
)
return session_id
def get_session(session_id):
session = cache.get(f"session:{session_id}")
return json.loads(session) if session else None
Why Redis for sessions?
- Fast access
- Automatic expiration (TTL)
- Shared across multiple servers (horizontal scaling)
Real-World Caching Examples
Instagram: Feed Generation
Problem: Generate personalized feed for 2 billion users
Solution:
1. Pre-compute feed for active users
2. Store in Redis cache
3. When user opens app:
- Fetch from cache (< 10ms)
- If cache miss: Generate on-demand (500ms)
4. Update cache when:
- New post from followed user
- User likes/comments
Result: Sub-second feed loading
Amazon: Product Recommendations
Problem: Recommend products based on browsing history
Solution:
1. Cache "frequently bought together" for each product
2. Cache "customers who bought X also bought Y"
3. TTL: 1 hour (recommendations don't need real-time updates)
Result: Instant recommendations without complex database queries
Gmail: Email List
Problem: Fetch email list for millions of users
Solution:
1. Cache first 50 emails for each user
2. TTL: 5 minutes
3. On new email arrival:
- Invalidate cache
- Pre-compute new list
- Update cache
Result: Instant inbox loading
System Design Interview Tips
Common Questions
Q: "How would you design Twitter?"
Answer (caching strategy):
1. Cache user timeline (Redis)
- Key: "timeline:{user_id}"
- Value: List of tweet IDs
- TTL: 5 minutes
2. Cache tweet details (Redis)
- Key: "tweet:{tweet_id}"
- Value: Tweet object
- TTL: 1 hour (tweets rarely change)
3. Cache user profiles (Redis)
- Key: "user:{user_id}"
- Value: Profile object
- TTL: 30 minutes
4. CDN for images/videos
- Edge caching
- Long TTL (immutable content)
Things to Mention
β Multiple cache layers (Browser, CDN, Application, Database) β Eviction policy (Usually LRU) β Cache invalidation (TTL + manual invalidation on updates) β Cache-aside pattern (Most common) β Monitoring (Cache hit rate, latency)
Avoid These Mistakes
β Not mentioning cache invalidation (shows lack of depth) β Caching everything (cache only hot data) β Ignoring consistency (mention eventual consistency trade-offs) β Not considering cache failures (what if Redis goes down?)
Conclusion
Caching is the secret weapon of scalable systems:
- Browser cache: Instant asset loading
- CDN: Global low-latency delivery
- Application cache (Redis): Fast data access
- Database cache: Reduced query load
Key takeaways:
- Cache at multiple layers
- Use cache-aside pattern for most applications
- LRU eviction policy is default (and usually best)
- Cache invalidation is hardβcombine TTL + manual invalidation
- Monitor cache hit rate (aim for 80-90%+)
Remember: Without caching, Facebook would take 30 seconds to load. With caching, it takes 1.5 seconds.
That's the power of caching. Master it, and you'll ace system design interviewsβand build systems that actually scale.