Load Balancing Explained: Algorithms, Strategies & Real Examples
Master load balancing for system design interviews. Learn round-robin, least connections, consistent hashing, health checks with examples from Netflix, AWS, Google

You Have 10 Servers. One Is Burning at 95% CPU. Nine Are Idle at 10%. What Went Wrong?
This exact scenario happened to me during a production incident. We had horizontally scaled our API to 10 servers. Everything looked good on paper. But users were experiencing timeouts and errors.
The culprit? No load balancer.
Requests were randomly hitting servers, and one unlucky server got hammered while others sat idle. We were paying for 10 servers but effectively using one.
After implementing a load balancer with proper algorithms? Problem solved. Requests evenly distributed, all servers utilized, response times cut in half.
Load balancing isn't optional for scalable systemsβit's fundamental. Let me show you exactly how it works.
What Is Load Balancing?
Load balancing = Distributing incoming requests across multiple servers
Think of it like checkout lanes at a supermarket:
- Without load balancer: Everyone lines up at register 1, registers 2-10 are empty
- With load balancer: Customers directed to available registers, wait time minimized
Why Load Balancing?
Problem 1: Single server bottleneck
1 server handling 10,000 requests/sec
β
CPU maxed out
β
Response time: 5 seconds (slow!)
Problem 2: No fault tolerance
1 server fails
β
Entire application down
β
100% downtime
Solution: Load balancer + multiple servers
[Load Balancer]
β
ββββββββββββββΌβββββββββββββ
β β β
[Server 1] [Server 2] [Server 3]
3,333 req/s 3,333 req/s 3,333 req/s
Result:
β
Load distributed evenly
β
Response time: < 500ms
β
One server fails? Others handle the load
β
Zero downtime
Load Balancing in Real Systems
Example 1: Google Search
When you search on Google:
Your request β Google's global load balancer
β
Determines closest data center (based on your location)
β
Data center load balancer
β
Distributes to one of thousands of servers
β
Response in < 200ms
Scale:
- 40,000+ searches per second
- Millions of servers worldwide
- Load balanced at multiple layers
Example 2: Netflix
Problem: Stream video to 230 million subscribers worldwide
Solution:
User in India requests "Stranger Things"
β
AWS Route 53 (DNS-level load balancing)
β
Directs to nearest AWS region (Mumbai)
β
Application load balancer
β
Distributes across 100+ video streaming servers
β
Smooth 4K streaming
Result: 99.99% uptime, even during peak hours
Types of Load Balancers
1. Hardware Load Balancers
Examples: F5, Citrix NetScaler
Characteristics:
- Physical devices
- Extremely high performance
- Very expensive ($10,000 - $100,000+)
- Used by large enterprises (banks, telecoms)
When to use: Legacy systems, strict compliance requirements
2. Software Load Balancers
Examples: Nginx, HAProxy, AWS Application Load Balancer
Characteristics:
- Run on standard servers
- Cost-effective
- Highly flexible
- Easy to scale horizontally
When to use: Most modern applications (startups to enterprises)
3. DNS Load Balancing
How it works:
User requests: www.example.com
β
DNS returns different IP based on:
- Geographic location
- Server health
- Load distribution
Example:
User in USA: DNS returns 54.23.45.67 (US data center)
User in Europe: DNS returns 35.87.23.45 (EU data center)
User in Asia: DNS returns 13.56.78.90 (Asia data center)
Pros: β Global distribution β Low latency for users worldwide
Cons: β DNS caching (changes take time to propagate) β Coarse-grained (per data center, not per server)
Real Example: AWS Route 53, Cloudflare
Load Balancing Algorithms
The algorithm determines which server gets the next request.
1. Round Robin
Strategy: Distribute requests sequentially
Example:
Request 1 β Server 1
Request 2 β Server 2
Request 3 β Server 3
Request 4 β Server 1 (back to start)
Request 5 β Server 2
...
Pros: β Simple β Fair distribution β No state needed
Cons: β Ignores server capacity (all servers treated equally) β Ignores current load β Long requests block a server
When to use: Servers are identical, requests are similar in processing time
Code Example (Python):
class RoundRobinLoadBalancer:
def __init__(self, servers):
self.servers = servers
self.current = 0
def get_server(self):
server = self.servers[self.current]
self.current = (self.current + 1) % len(self.servers)
return server
# Usage
lb = RoundRobinLoadBalancer(['Server1', 'Server2', 'Server3'])
print(lb.get_server()) # Server1
print(lb.get_server()) # Server2
print(lb.get_server()) # Server3
print(lb.get_server()) # Server1
2. Weighted Round Robin
Strategy: Servers with higher capacity get more requests
Example:
Server 1: Weight 3 (powerful server)
Server 2: Weight 2 (medium server)
Server 3: Weight 1 (weak server)
Distribution:
Server 1 β 3 requests
Server 2 β 2 requests
Server 3 β 1 request
(Then repeat)
When to use: Servers have different capacities
Code Example:
class WeightedRoundRobinLoadBalancer:
def __init__(self, servers):
# servers = [('Server1', 3), ('Server2', 2), ('Server3', 1)]
self.weighted_servers = []
for server, weight in servers:
self.weighted_servers.extend([server] * weight)
self.current = 0
def get_server(self):
server = self.weighted_servers[self.current]
self.current = (self.current + 1) % len(self.weighted_servers)
return server
# Usage
lb = WeightedRoundRobinLoadBalancer([
('Server1', 3),
('Server2', 2),
('Server3', 1)
])
# Will return: Server1, Server1, Server1, Server2, Server2, Server3, (repeat)
Real Example: AWS Application Load Balancer supports weighted target groups
3. Least Connections
Strategy: Send request to server with fewest active connections
Example:
Server 1: 5 active connections
Server 2: 3 active connections β Choose this one!
Server 3: 7 active connections
Next request β Server 2
Why better than Round Robin?
Imagine:
- Request A: Simple query (completes in 10ms)
- Request B: Complex report (takes 5 seconds)
Round Robin:
Request A β Server 1 (finishes quickly)
Request B β Server 2 (still processing...)
Request C β Server 3
Request D β Server 1
Request E β Server 2 (BLOCKED! Request B still running)
Least Connections:
Request A β Server 1 (finishes quickly)
Request B β Server 2 (still processing...)
Request C β Server 3
Request D β Server 1 (least connections!)
Request E β Server 3 (avoids Server 2)
Pros: β Handles long-lived connections well β Better for non-uniform requests β More balanced load
Cons: β Requires tracking connection state β Slightly more overhead
When to use: WebSockets, long-polling, variable request times
Real Example: HAProxy default algorithm
4. Least Response Time
Strategy: Send request to server with lowest average response time
Example:
Server 1: Average response time 50ms β Choose this one!
Server 2: Average response time 150ms
Server 3: Average response time 100ms
Next request β Server 1
When to use: Heterogeneous servers, performance-critical applications
Real Example: Nginx Plus (commercial version)
5. IP Hash (Sticky Sessions)
Strategy: Hash user's IP address to consistently route to same server
Example:
def ip_hash(ip_address, servers):
hash_value = hash(ip_address) % len(servers)
return servers[hash_value]
# User from IP 123.45.67.89 always goes to Server 2
# User from IP 98.76.54.32 always goes to Server 1
Why useful?
Problem: Session state stored locally on server
User login β Server 1 (session stored)
Next request β Server 2 (no session! User appears logged out β)
Solution: IP hash ensures same user β same server
User login β Server 1 (session stored)
Next request β Server 1 (session exists! β
)
Pros: β Simple session management β No need for centralized session storage
Cons: β Uneven distribution (some IPs more active) β If server fails, sessions lost
Better Solution: Use Redis for shared sessions (stateless servers)
When to use: Legacy apps with local session storage
6. Consistent Hashing
Problem with IP Hash:
3 servers β Add 4th server
β
Hash function changes
β
ALL users remapped to different servers
β
ALL sessions lost!
Solution: Consistent Hashing
Only ~25% of users remapped when adding server (not 100%)
How it works:
1. Hash servers onto a ring (0-360Β°)
Server 1: 45Β°
Server 2: 180Β°
Server 3: 270Β°
2. Hash user IP onto same ring
User A: 60Β° β Goes to next server clockwise (Server 2)
User B: 200Β° β Goes to Server 3
User C: 300Β° β Goes to Server 1
3. Add Server 4 at 135Β°
User A: 60Β° β Still goes to Server 2 β
User B: 200Β° β Still goes to Server 3 β
Only users between 45Β°-135Β° remapped to Server 4
When to use: Distributed caches (Memcached, Redis Cluster), CDNs
Real Example: AWS ElastiCache, Cassandra, DynamoDB
Health Checks
Problem: What if a server fails?
Load balancer needs to detect failures and stop sending traffic.
Active Health Checks
Strategy: Load balancer periodically pings servers
Example:
Every 5 seconds:
Load balancer β Server 1: GET /health
Response: 200 OK β
Load balancer β Server 2: GET /health
Response: Timeout β
Load balancer β Server 3: GET /health
Response: 200 OK β
Result: Remove Server 2 from rotation
Nginx Configuration:
upstream backend {
server backend1.example.com;
server backend2.example.com;
server backend3.example.com;
# Health check every 5 seconds
# Fail after 3 consecutive failures
# Timeout after 2 seconds
check interval=5000 fall=3 rise=2 timeout=2000;
}
Passive Health Checks
Strategy: Detect failures based on actual traffic
Example:
Request to Server 2 β Error 500
Request to Server 2 β Error 500
Request to Server 2 β Timeout
After 3 failures in 30 seconds:
β Mark Server 2 as unhealthy
β Stop sending traffic
When to use: Complement active checks (catch issues faster)
Layer 4 vs Layer 7 Load Balancing
Layer 4 (Transport Layer)
Routes based on: IP address, port
Pros: β Fast (no packet inspection) β Low latency β Works with any protocol (HTTP, WebSocket, TCP)
Cons: β Cannot make routing decisions based on content β Less flexible
Example:
All traffic to port 443 β Backend server pool
(Doesn't look at URL, headers, etc.)
Real Example: AWS Network Load Balancer
Layer 7 (Application Layer)
Routes based on: HTTP headers, URL path, cookies
Pros: β Advanced routing rules β Content-based routing β SSL termination
Cons: β Slower (must parse HTTP) β Higher latency
Example:
example.com/api/* β API server pool
example.com/images/* β Static file server pool
example.com/* β Web server pool
Nginx Configuration:
http {
upstream api_servers {
server api1.example.com;
server api2.example.com;
}
upstream web_servers {
server web1.example.com;
server web2.example.com;
}
server {
location /api/ {
proxy_pass http://api_servers;
}
location / {
proxy_pass http://web_servers;
}
}
}
Real Example: AWS Application Load Balancer, Nginx, HAProxy
Global vs Local Load Balancing
Local Load Balancing
Scope: Within a single data center
Example:
User request β Data center
β
Load balancer distributes across 10 servers
Global Load Balancing
Scope: Across multiple data centers worldwide
Example:
User in Tokyo
β
DNS returns Tokyo data center IP
β
Tokyo load balancer β Tokyo servers
User in London
β
DNS returns London data center IP
β
London load balancer β London servers
Benefits: β Low latency (users routed to nearest data center) β Disaster recovery (failover to other regions) β Compliance (data stays in region)
Real Example: Cloudflare, AWS Route 53, Google Cloud Load Balancing
System Design Interview Tips
Common Question: "Design Instagram"
Load balancing strategy:
1. DNS Load Balancing
- Route users to nearest region (US, EU, Asia)
2. Regional Load Balancer
- Distribute across multiple data centers in region
3. Application Load Balancer (Layer 7)
- /api/* β API servers
- /images/* β CDN/image servers
- /* β Web servers
4. Algorithm
- Least Connections (variable request times)
- Health checks every 10 seconds
5. Sticky Sessions
- Not needed (stateless design with Redis sessions)
What to Mention
β Multiple layers of load balancing (DNS, regional, local) β Algorithm choice with justification β Health checks (active + passive) β Layer 7 for content-based routing β Handling failures (remove unhealthy servers)
Avoid These Mistakes
β Not explaining which algorithm and why β Ignoring health checks β Forgetting geographic distribution β Not considering session stickiness (if needed)
Practical Tips
1. Start Simple
Don't need a load balancer until:
- You have 2+ servers
- Single server is bottleneck
Start with: Single server, vertical scaling
Scale to: Load balancer + horizontal scaling
2. Choose the Right Algorithm
Quick guide:
- Uniform requests, identical servers: Round Robin
- Variable request times: Least Connections
- Different server capacities: Weighted Round Robin
- Need session persistence: Consistent Hashing + Redis sessions (avoid IP hash)
3. Monitor Everything
Key metrics:
- Requests per second per server
- Average response time per server
- Error rate per server
- Connection count per server
Use: Prometheus + Grafana, AWS CloudWatch, Datadog
4. Test Failure Scenarios
Simulate:
- Kill a server (does traffic reroute?)
- Slow down a server (does load balancer detect?)
- Spike traffic (does autoscaling trigger?)
Real Example: Netflix Chaos Monkey (randomly kills production servers to test resilience)
Conclusion
Load balancing is the backbone of scalable systems:
- Distributes load across multiple servers
- Eliminates single point of failure
- Enables horizontal scaling
- Reduces latency (geographic routing)
Key takeaways:
- Use Layer 7 load balancing for content-based routing
- Least Connections algorithm for most cases
- Always implement health checks
- Go stateless (no sticky sessions if possible)
- Load balance at multiple layers (DNS, regional, local)
Remember: 10 servers without a load balancer = wasted money and poor performance. 10 servers with a load balancer = scalable, resilient system.
Master load balancing, and you'll never face the "one server at 95%, nine at 10%" problem again.