Ojaswi Athghara | Load Balancing Explained: Algorithms, Strategies & Real Examples

Load Balancing Explained: Algorithms, Strategies & Real Examples

You Have 10 Servers. One Is Burning at 95% CPU. Nine Are Idle at 10%. What Went Wrong?

This exact scenario happened to me during a production incident. We had horizontally scaled our API to 10 servers. Everything looked good on paper. But users were experiencing timeouts and errors.

The culprit? No load balancer.

Requests were randomly hitting servers, and one unlucky server got hammered while others sat idle. We were paying for 10 servers but effectively using one.

After implementing a load balancer with proper algorithms? Problem solved. Requests evenly distributed, all servers utilized, response times cut in half.

Load balancing isn't optional for scalable systems—it's fundamental. Let me show you exactly how it works.

What Is Load Balancing?

Load balancing = Distributing incoming requests across multiple servers

Think of it like checkout lanes at a supermarket:

Without load balancer: Everyone lines up at register 1, registers 2-10 are empty
With load balancer: Customers directed to available registers, wait time minimized

Why Load Balancing?

Problem 1: Single server bottleneck

1 server handling 10,000 requests/sec
    ↓
CPU maxed out
    ↓
Response time: 5 seconds (slow!)

Problem 2: No fault tolerance

1 server fails
    ↓
Entire application down
    ↓
100% downtime

Solution: Load balancer + multiple servers

           [Load Balancer]
                 ↓
    ┌────────────┼────────────┐
    ↓            ↓            ↓
[Server 1]   [Server 2]   [Server 3]
 3,333 req/s  3,333 req/s  3,333 req/s

Result:
✅ Load distributed evenly
✅ Response time: < 500ms
✅ One server fails? Others handle the load
✅ Zero downtime

Load Balancing in Real Systems

Example 1: Google Search

When you search on Google:

Your request → Google's global load balancer
     ↓
Determines closest data center (based on your location)
     ↓
Data center load balancer
     ↓
Distributes to one of thousands of servers
     ↓
Response in < 200ms

Scale:

40,000+ searches per second
Millions of servers worldwide
Load balanced at multiple layers

Example 2: Netflix

Problem: Stream video to 230 million subscribers worldwide

Solution:

User in India requests "Stranger Things"
     ↓
AWS Route 53 (DNS-level load balancing)
     ↓
Directs to nearest AWS region (Mumbai)
     ↓
Application load balancer
     ↓
Distributes across 100+ video streaming servers
     ↓
Smooth 4K streaming

Result: 99.99% uptime, even during peak hours

Types of Load Balancers

1. Hardware Load Balancers

Examples: F5, Citrix NetScaler

Characteristics:

Physical devices
Extremely high performance
Very expensive ($10,000 - $100,000+)
Used by large enterprises (banks, telecoms)

When to use: Legacy systems, strict compliance requirements

2. Software Load Balancers

Examples: Nginx, HAProxy, AWS Application Load Balancer

Characteristics:

Run on standard servers
Cost-effective
Highly flexible
Easy to scale horizontally

When to use: Most modern applications (startups to enterprises)

3. DNS Load Balancing

How it works:

User requests: www.example.com
     ↓
DNS returns different IP based on:
  - Geographic location
  - Server health
  - Load distribution

Example:

User in USA:     DNS returns 54.23.45.67 (US data center)
User in Europe:  DNS returns 35.87.23.45 (EU data center)
User in Asia:    DNS returns 13.56.78.90 (Asia data center)

Pros: ✅ Global distribution ✅ Low latency for users worldwide

Cons: ❌ DNS caching (changes take time to propagate) ❌ Coarse-grained (per data center, not per server)

Real Example: AWS Route 53, Cloudflare

Load Balancing Algorithms

The algorithm determines which server gets the next request.

1. Round Robin

Strategy: Distribute requests sequentially

Example:

Request 1 → Server 1
Request 2 → Server 2
Request 3 → Server 3
Request 4 → Server 1  (back to start)
Request 5 → Server 2
...

Pros: ✅ Simple ✅ Fair distribution ✅ No state needed

Cons: ❌ Ignores server capacity (all servers treated equally) ❌ Ignores current load ❌ Long requests block a server

When to use: Servers are identical, requests are similar in processing time

Code Example (Python):

class RoundRobinLoadBalancer:
    def __init__(self, servers):
        self.servers = servers
        self.current = 0
    
    def get_server(self):
        server = self.servers[self.current]
        self.current = (self.current + 1) % len(self.servers)
        return server

# Usage
lb = RoundRobinLoadBalancer(['Server1', 'Server2', 'Server3'])
print(lb.get_server())  # Server1
print(lb.get_server())  # Server2
print(lb.get_server())  # Server3
print(lb.get_server())  # Server1

2. Weighted Round Robin

Strategy: Servers with higher capacity get more requests

Example:

Server 1: Weight 3 (powerful server)
Server 2: Weight 2 (medium server)
Server 3: Weight 1 (weak server)

Distribution:
Server 1 → 3 requests
Server 2 → 2 requests
Server 3 → 1 request
(Then repeat)

When to use: Servers have different capacities

Code Example:

class WeightedRoundRobinLoadBalancer:
    def __init__(self, servers):
        # servers = [('Server1', 3), ('Server2', 2), ('Server3', 1)]
        self.weighted_servers = []
        for server, weight in servers:
            self.weighted_servers.extend([server] * weight)
        
        self.current = 0
    
    def get_server(self):
        server = self.weighted_servers[self.current]
        self.current = (self.current + 1) % len(self.weighted_servers)
        return server

# Usage
lb = WeightedRoundRobinLoadBalancer([
    ('Server1', 3),
    ('Server2', 2),
    ('Server3', 1)
])

# Will return: Server1, Server1, Server1, Server2, Server2, Server3, (repeat)

Real Example: AWS Application Load Balancer supports weighted target groups

3. Least Connections

Strategy: Send request to server with fewest active connections

Example:

Server 1: 5 active connections
Server 2: 3 active connections  ← Choose this one!
Server 3: 7 active connections

Next request → Server 2

Why better than Round Robin?

Imagine:

Request A: Simple query (completes in 10ms)
Request B: Complex report (takes 5 seconds)

Round Robin:

Request A → Server 1 (finishes quickly)
Request B → Server 2 (still processing...)
Request C → Server 3
Request D → Server 1
Request E → Server 2 (BLOCKED! Request B still running)

Least Connections:

Request A → Server 1 (finishes quickly)
Request B → Server 2 (still processing...)
Request C → Server 3
Request D → Server 1 (least connections!)
Request E → Server 3 (avoids Server 2)

Pros: ✅ Handles long-lived connections well ✅ Better for non-uniform requests ✅ More balanced load

Cons: ❌ Requires tracking connection state ❌ Slightly more overhead

When to use: WebSockets, long-polling, variable request times

Real Example: HAProxy default algorithm

4. Least Response Time

Strategy: Send request to server with lowest average response time

Example:

Server 1: Average response time 50ms  ← Choose this one!
Server 2: Average response time 150ms
Server 3: Average response time 100ms

Next request → Server 1

When to use: Heterogeneous servers, performance-critical applications

Real Example: Nginx Plus (commercial version)

5. IP Hash (Sticky Sessions)

Strategy: Hash user's IP address to consistently route to same server

Example:

def ip_hash(ip_address, servers):
    hash_value = hash(ip_address) % len(servers)
    return servers[hash_value]

# User from IP 123.45.67.89 always goes to Server 2
# User from IP 98.76.54.32 always goes to Server 1

Why useful?

Problem: Session state stored locally on server

User login → Server 1 (session stored)
Next request → Server 2 (no session! User appears logged out ❌)

Solution: IP hash ensures same user → same server

User login → Server 1 (session stored)
Next request → Server 1 (session exists! ✅)

Pros: ✅ Simple session management ✅ No need for centralized session storage

Cons: ❌ Uneven distribution (some IPs more active) ❌ If server fails, sessions lost

Better Solution: Use Redis for shared sessions (stateless servers)

When to use: Legacy apps with local session storage

6. Consistent Hashing

Problem with IP Hash:

3 servers → Add 4th server
    ↓
Hash function changes
    ↓
ALL users remapped to different servers
    ↓
ALL sessions lost!

Solution: Consistent Hashing

Only ~25% of users remapped when adding server (not 100%)

How it works:

1. Hash servers onto a ring (0-360°)
   Server 1: 45°
   Server 2: 180°
   Server 3: 270°

2. Hash user IP onto same ring
   User A: 60° → Goes to next server clockwise (Server 2)
   User B: 200° → Goes to Server 3
   User C: 300° → Goes to Server 1

3. Add Server 4 at 135°
   User A: 60° → Still goes to Server 2 ✅
   User B: 200° → Still goes to Server 3 ✅
   Only users between 45°-135° remapped to Server 4

When to use: Distributed caches (Memcached, Redis Cluster), CDNs

Real Example: AWS ElastiCache, Cassandra, DynamoDB

Health Checks

Problem: What if a server fails?

Load balancer needs to detect failures and stop sending traffic.

Active Health Checks

Strategy: Load balancer periodically pings servers

Example:

Every 5 seconds:
  Load balancer → Server 1: GET /health
  Response: 200 OK ✅
  
  Load balancer → Server 2: GET /health
  Response: Timeout ❌
  
  Load balancer → Server 3: GET /health
  Response: 200 OK ✅

Result: Remove Server 2 from rotation

Nginx Configuration:

upstream backend {
    server backend1.example.com;
    server backend2.example.com;
    server backend3.example.com;
    
    # Health check every 5 seconds
    # Fail after 3 consecutive failures
    # Timeout after 2 seconds
    check interval=5000 fall=3 rise=2 timeout=2000;
}

Passive Health Checks

Strategy: Detect failures based on actual traffic

Example:

Request to Server 2 → Error 500
Request to Server 2 → Error 500
Request to Server 2 → Timeout

After 3 failures in 30 seconds:
  → Mark Server 2 as unhealthy
  → Stop sending traffic

When to use: Complement active checks (catch issues faster)

Layer 4 vs Layer 7 Load Balancing

Layer 4 (Transport Layer)

Routes based on: IP address, port

Pros: ✅ Fast (no packet inspection) ✅ Low latency ✅ Works with any protocol (HTTP, WebSocket, TCP)

Cons: ❌ Cannot make routing decisions based on content ❌ Less flexible

Example:

All traffic to port 443 → Backend server pool
(Doesn't look at URL, headers, etc.)

Real Example: AWS Network Load Balancer

Layer 7 (Application Layer)

Routes based on: HTTP headers, URL path, cookies

Pros: ✅ Advanced routing rules ✅ Content-based routing ✅ SSL termination

Cons: ❌ Slower (must parse HTTP) ❌ Higher latency

Example:

example.com/api/*    → API server pool
example.com/images/* → Static file server pool
example.com/*        → Web server pool

Nginx Configuration:

http {
    upstream api_servers {
        server api1.example.com;
        server api2.example.com;
    }
    
    upstream web_servers {
        server web1.example.com;
        server web2.example.com;
    }
    
    server {
        location /api/ {
            proxy_pass http://api_servers;
        }
        
        location / {
            proxy_pass http://web_servers;
        }
    }
}

Real Example: AWS Application Load Balancer, Nginx, HAProxy

Global vs Local Load Balancing

Local Load Balancing

Scope: Within a single data center

Example:

User request → Data center
    ↓
Load balancer distributes across 10 servers

Global Load Balancing

Scope: Across multiple data centers worldwide

Example:

User in Tokyo
    ↓
DNS returns Tokyo data center IP
    ↓
Tokyo load balancer → Tokyo servers

User in London
    ↓
DNS returns London data center IP
    ↓
London load balancer → London servers

Benefits: ✅ Low latency (users routed to nearest data center) ✅ Disaster recovery (failover to other regions) ✅ Compliance (data stays in region)

Real Example: Cloudflare, AWS Route 53, Google Cloud Load Balancing

System Design Interview Tips

Common Question: "Design Instagram"

Load balancing strategy:

1. DNS Load Balancing
   - Route users to nearest region (US, EU, Asia)
   
2. Regional Load Balancer
   - Distribute across multiple data centers in region
   
3. Application Load Balancer (Layer 7)
   - /api/* → API servers
   - /images/* → CDN/image servers
   - /* → Web servers
   
4. Algorithm
   - Least Connections (variable request times)
   - Health checks every 10 seconds
   
5. Sticky Sessions
   - Not needed (stateless design with Redis sessions)

You have 2+ servers
Single server is bottleneck

Start with: Single server, vertical scaling

Scale to: Load balancer + horizontal scaling

2. Choose the Right Algorithm

Quick guide:

Uniform requests, identical servers: Round Robin
Variable request times: Least Connections
Different server capacities: Weighted Round Robin
Need session persistence: Consistent Hashing + Redis sessions (avoid IP hash)

3. Monitor Everything

Key metrics:

Requests per second per server
Average response time per server
Error rate per server
Connection count per server

Use: Prometheus + Grafana, AWS CloudWatch, Datadog

4. Test Failure Scenarios

Simulate:

Kill a server (does traffic reroute?)
Slow down a server (does load balancer detect?)
Spike traffic (does autoscaling trigger?)

Real Example: Netflix Chaos Monkey (randomly kills production servers to test resilience)

Conclusion

Load balancing is the backbone of scalable systems:

Distributes load across multiple servers
Eliminates single point of failure
Enables horizontal scaling
Reduces latency (geographic routing)

Key takeaways:

Use Layer 7 load balancing for content-based routing
Least Connections algorithm for most cases
Always implement health checks
Go stateless (no sticky sessions if possible)
Load balance at multiple layers (DNS, regional, local)

Remember: 10 servers without a load balancer = wasted money and poor performance. 10 servers with a load balancer = scalable, resilient system.

Master load balancing, and you'll never face the "one server at 95%, nine at 10%" problem again.

Load Balancing Explained: Algorithms, Strategies & Real Examples

You Have 10 Servers. One Is Burning at 95% CPU. Nine Are Idle at 10%. What Went Wrong?

What Is Load Balancing?

Why Load Balancing?

Load Balancing in Real Systems

Example 1: Google Search

Example 2: Netflix

Types of Load Balancers

1. Hardware Load Balancers

2. Software Load Balancers

3. DNS Load Balancing

Load Balancing Algorithms

1. Round Robin

2. Weighted Round Robin

3. Least Connections

4. Least Response Time

5. IP Hash (Sticky Sessions)

6. Consistent Hashing

Health Checks

Active Health Checks

Passive Health Checks

Layer 4 vs Layer 7 Load Balancing

Layer 4 (Transport Layer)

Layer 7 (Application Layer)

Global vs Local Load Balancing

Local Load Balancing

Global Load Balancing

System Design Interview Tips

Common Question: "Design Instagram"

What to Mention

Avoid These Mistakes

Practical Tips

1. Start Simple

2. Choose the Right Algorithm

3. Monitor Everything

4. Test Failure Scenarios

Conclusion

Support My Work

Related Blogs