Load Balancing Explained: Algorithms, Strategies & Real Examples

Master load balancing for system design interviews. Learn round-robin, least connections, consistent hashing, health checks with examples from Netflix, AWS, Google

πŸ“… Published: May 12, 2025 ✏️ Updated: June 8, 2025 By Ojaswi Athghara
#load-balancer #nginx #dsa #system-design #scalability

Load Balancing Explained: Algorithms, Strategies & Real Examples

You Have 10 Servers. One Is Burning at 95% CPU. Nine Are Idle at 10%. What Went Wrong?

This exact scenario happened to me during a production incident. We had horizontally scaled our API to 10 servers. Everything looked good on paper. But users were experiencing timeouts and errors.

The culprit? No load balancer.

Requests were randomly hitting servers, and one unlucky server got hammered while others sat idle. We were paying for 10 servers but effectively using one.

After implementing a load balancer with proper algorithms? Problem solved. Requests evenly distributed, all servers utilized, response times cut in half.

Load balancing isn't optional for scalable systemsβ€”it's fundamental. Let me show you exactly how it works.


What Is Load Balancing?

Load balancing = Distributing incoming requests across multiple servers

Think of it like checkout lanes at a supermarket:

  • Without load balancer: Everyone lines up at register 1, registers 2-10 are empty
  • With load balancer: Customers directed to available registers, wait time minimized

Why Load Balancing?

Problem 1: Single server bottleneck

1 server handling 10,000 requests/sec
    ↓
CPU maxed out
    ↓
Response time: 5 seconds (slow!)

Problem 2: No fault tolerance

1 server fails
    ↓
Entire application down
    ↓
100% downtime

Solution: Load balancer + multiple servers

           [Load Balancer]
                 ↓
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    ↓            ↓            ↓
[Server 1]   [Server 2]   [Server 3]
 3,333 req/s  3,333 req/s  3,333 req/s

Result:
βœ… Load distributed evenly
βœ… Response time: < 500ms
βœ… One server fails? Others handle the load
βœ… Zero downtime

Load Balancing in Real Systems

When you search on Google:

Your request β†’ Google's global load balancer
     ↓
Determines closest data center (based on your location)
     ↓
Data center load balancer
     ↓
Distributes to one of thousands of servers
     ↓
Response in < 200ms

Scale:

  • 40,000+ searches per second
  • Millions of servers worldwide
  • Load balanced at multiple layers

Example 2: Netflix

Problem: Stream video to 230 million subscribers worldwide

Solution:

User in India requests "Stranger Things"
     ↓
AWS Route 53 (DNS-level load balancing)
     ↓
Directs to nearest AWS region (Mumbai)
     ↓
Application load balancer
     ↓
Distributes across 100+ video streaming servers
     ↓
Smooth 4K streaming

Result: 99.99% uptime, even during peak hours


Types of Load Balancers

1. Hardware Load Balancers

Examples: F5, Citrix NetScaler

Characteristics:

  • Physical devices
  • Extremely high performance
  • Very expensive ($10,000 - $100,000+)
  • Used by large enterprises (banks, telecoms)

When to use: Legacy systems, strict compliance requirements


2. Software Load Balancers

Examples: Nginx, HAProxy, AWS Application Load Balancer

Characteristics:

  • Run on standard servers
  • Cost-effective
  • Highly flexible
  • Easy to scale horizontally

When to use: Most modern applications (startups to enterprises)


3. DNS Load Balancing

How it works:

User requests: www.example.com
     ↓
DNS returns different IP based on:
  - Geographic location
  - Server health
  - Load distribution

Example:

User in USA:     DNS returns 54.23.45.67 (US data center)
User in Europe:  DNS returns 35.87.23.45 (EU data center)
User in Asia:    DNS returns 13.56.78.90 (Asia data center)

Pros: βœ… Global distribution βœ… Low latency for users worldwide

Cons: ❌ DNS caching (changes take time to propagate) ❌ Coarse-grained (per data center, not per server)

Real Example: AWS Route 53, Cloudflare


Load Balancing Algorithms

The algorithm determines which server gets the next request.

1. Round Robin

Strategy: Distribute requests sequentially

Example:

Request 1 β†’ Server 1
Request 2 β†’ Server 2
Request 3 β†’ Server 3
Request 4 β†’ Server 1  (back to start)
Request 5 β†’ Server 2
...

Pros: βœ… Simple βœ… Fair distribution βœ… No state needed

Cons: ❌ Ignores server capacity (all servers treated equally) ❌ Ignores current load ❌ Long requests block a server

When to use: Servers are identical, requests are similar in processing time

Code Example (Python):

class RoundRobinLoadBalancer:
    def __init__(self, servers):
        self.servers = servers
        self.current = 0
    
    def get_server(self):
        server = self.servers[self.current]
        self.current = (self.current + 1) % len(self.servers)
        return server

# Usage
lb = RoundRobinLoadBalancer(['Server1', 'Server2', 'Server3'])
print(lb.get_server())  # Server1
print(lb.get_server())  # Server2
print(lb.get_server())  # Server3
print(lb.get_server())  # Server1

2. Weighted Round Robin

Strategy: Servers with higher capacity get more requests

Example:

Server 1: Weight 3 (powerful server)
Server 2: Weight 2 (medium server)
Server 3: Weight 1 (weak server)

Distribution:
Server 1 β†’ 3 requests
Server 2 β†’ 2 requests
Server 3 β†’ 1 request
(Then repeat)

When to use: Servers have different capacities

Code Example:

class WeightedRoundRobinLoadBalancer:
    def __init__(self, servers):
        # servers = [('Server1', 3), ('Server2', 2), ('Server3', 1)]
        self.weighted_servers = []
        for server, weight in servers:
            self.weighted_servers.extend([server] * weight)
        
        self.current = 0
    
    def get_server(self):
        server = self.weighted_servers[self.current]
        self.current = (self.current + 1) % len(self.weighted_servers)
        return server

# Usage
lb = WeightedRoundRobinLoadBalancer([
    ('Server1', 3),
    ('Server2', 2),
    ('Server3', 1)
])

# Will return: Server1, Server1, Server1, Server2, Server2, Server3, (repeat)

Real Example: AWS Application Load Balancer supports weighted target groups


3. Least Connections

Strategy: Send request to server with fewest active connections

Example:

Server 1: 5 active connections
Server 2: 3 active connections  ← Choose this one!
Server 3: 7 active connections

Next request β†’ Server 2

Why better than Round Robin?

Imagine:

  • Request A: Simple query (completes in 10ms)
  • Request B: Complex report (takes 5 seconds)

Round Robin:

Request A β†’ Server 1 (finishes quickly)
Request B β†’ Server 2 (still processing...)
Request C β†’ Server 3
Request D β†’ Server 1
Request E β†’ Server 2 (BLOCKED! Request B still running)

Least Connections:

Request A β†’ Server 1 (finishes quickly)
Request B β†’ Server 2 (still processing...)
Request C β†’ Server 3
Request D β†’ Server 1 (least connections!)
Request E β†’ Server 3 (avoids Server 2)

Pros: βœ… Handles long-lived connections well βœ… Better for non-uniform requests βœ… More balanced load

Cons: ❌ Requires tracking connection state ❌ Slightly more overhead

When to use: WebSockets, long-polling, variable request times

Real Example: HAProxy default algorithm


4. Least Response Time

Strategy: Send request to server with lowest average response time

Example:

Server 1: Average response time 50ms  ← Choose this one!
Server 2: Average response time 150ms
Server 3: Average response time 100ms

Next request β†’ Server 1

When to use: Heterogeneous servers, performance-critical applications

Real Example: Nginx Plus (commercial version)


5. IP Hash (Sticky Sessions)

Strategy: Hash user's IP address to consistently route to same server

Example:

def ip_hash(ip_address, servers):
    hash_value = hash(ip_address) % len(servers)
    return servers[hash_value]

# User from IP 123.45.67.89 always goes to Server 2
# User from IP 98.76.54.32 always goes to Server 1

Why useful?

Problem: Session state stored locally on server

User login β†’ Server 1 (session stored)
Next request β†’ Server 2 (no session! User appears logged out ❌)

Solution: IP hash ensures same user β†’ same server

User login β†’ Server 1 (session stored)
Next request β†’ Server 1 (session exists! βœ…)

Pros: βœ… Simple session management βœ… No need for centralized session storage

Cons: ❌ Uneven distribution (some IPs more active) ❌ If server fails, sessions lost

Better Solution: Use Redis for shared sessions (stateless servers)

When to use: Legacy apps with local session storage


6. Consistent Hashing

Problem with IP Hash:

3 servers β†’ Add 4th server
    ↓
Hash function changes
    ↓
ALL users remapped to different servers
    ↓
ALL sessions lost!

Solution: Consistent Hashing

Only ~25% of users remapped when adding server (not 100%)

How it works:

1. Hash servers onto a ring (0-360Β°)
   Server 1: 45Β°
   Server 2: 180Β°
   Server 3: 270Β°

2. Hash user IP onto same ring
   User A: 60Β° β†’ Goes to next server clockwise (Server 2)
   User B: 200Β° β†’ Goes to Server 3
   User C: 300Β° β†’ Goes to Server 1

3. Add Server 4 at 135Β°
   User A: 60Β° β†’ Still goes to Server 2 βœ…
   User B: 200Β° β†’ Still goes to Server 3 βœ…
   Only users between 45Β°-135Β° remapped to Server 4

When to use: Distributed caches (Memcached, Redis Cluster), CDNs

Real Example: AWS ElastiCache, Cassandra, DynamoDB


Health Checks

Problem: What if a server fails?

Load balancer needs to detect failures and stop sending traffic.

Active Health Checks

Strategy: Load balancer periodically pings servers

Example:

Every 5 seconds:
  Load balancer β†’ Server 1: GET /health
  Response: 200 OK βœ…
  
  Load balancer β†’ Server 2: GET /health
  Response: Timeout ❌
  
  Load balancer β†’ Server 3: GET /health
  Response: 200 OK βœ…

Result: Remove Server 2 from rotation

Nginx Configuration:

upstream backend {
    server backend1.example.com;
    server backend2.example.com;
    server backend3.example.com;
    
    # Health check every 5 seconds
    # Fail after 3 consecutive failures
    # Timeout after 2 seconds
    check interval=5000 fall=3 rise=2 timeout=2000;
}

Passive Health Checks

Strategy: Detect failures based on actual traffic

Example:

Request to Server 2 β†’ Error 500
Request to Server 2 β†’ Error 500
Request to Server 2 β†’ Timeout

After 3 failures in 30 seconds:
  β†’ Mark Server 2 as unhealthy
  β†’ Stop sending traffic

When to use: Complement active checks (catch issues faster)


Layer 4 vs Layer 7 Load Balancing

Layer 4 (Transport Layer)

Routes based on: IP address, port

Pros: βœ… Fast (no packet inspection) βœ… Low latency βœ… Works with any protocol (HTTP, WebSocket, TCP)

Cons: ❌ Cannot make routing decisions based on content ❌ Less flexible

Example:

All traffic to port 443 β†’ Backend server pool
(Doesn't look at URL, headers, etc.)

Real Example: AWS Network Load Balancer


Layer 7 (Application Layer)

Routes based on: HTTP headers, URL path, cookies

Pros: βœ… Advanced routing rules βœ… Content-based routing βœ… SSL termination

Cons: ❌ Slower (must parse HTTP) ❌ Higher latency

Example:

example.com/api/*    β†’ API server pool
example.com/images/* β†’ Static file server pool
example.com/*        β†’ Web server pool

Nginx Configuration:

http {
    upstream api_servers {
        server api1.example.com;
        server api2.example.com;
    }
    
    upstream web_servers {
        server web1.example.com;
        server web2.example.com;
    }
    
    server {
        location /api/ {
            proxy_pass http://api_servers;
        }
        
        location / {
            proxy_pass http://web_servers;
        }
    }
}

Real Example: AWS Application Load Balancer, Nginx, HAProxy


Global vs Local Load Balancing

Local Load Balancing

Scope: Within a single data center

Example:

User request β†’ Data center
    ↓
Load balancer distributes across 10 servers

Global Load Balancing

Scope: Across multiple data centers worldwide

Example:

User in Tokyo
    ↓
DNS returns Tokyo data center IP
    ↓
Tokyo load balancer β†’ Tokyo servers

User in London
    ↓
DNS returns London data center IP
    ↓
London load balancer β†’ London servers

Benefits: βœ… Low latency (users routed to nearest data center) βœ… Disaster recovery (failover to other regions) βœ… Compliance (data stays in region)

Real Example: Cloudflare, AWS Route 53, Google Cloud Load Balancing


System Design Interview Tips

Common Question: "Design Instagram"

Load balancing strategy:

1. DNS Load Balancing
   - Route users to nearest region (US, EU, Asia)
   
2. Regional Load Balancer
   - Distribute across multiple data centers in region
   
3. Application Load Balancer (Layer 7)
   - /api/* β†’ API servers
   - /images/* β†’ CDN/image servers
   - /* β†’ Web servers
   
4. Algorithm
   - Least Connections (variable request times)
   - Health checks every 10 seconds
   
5. Sticky Sessions
   - Not needed (stateless design with Redis sessions)

What to Mention

βœ… Multiple layers of load balancing (DNS, regional, local) βœ… Algorithm choice with justification βœ… Health checks (active + passive) βœ… Layer 7 for content-based routing βœ… Handling failures (remove unhealthy servers)

Avoid These Mistakes

❌ Not explaining which algorithm and why ❌ Ignoring health checks ❌ Forgetting geographic distribution ❌ Not considering session stickiness (if needed)


Practical Tips

1. Start Simple

Don't need a load balancer until:

  • You have 2+ servers
  • Single server is bottleneck

Start with: Single server, vertical scaling

Scale to: Load balancer + horizontal scaling


2. Choose the Right Algorithm

Quick guide:

  • Uniform requests, identical servers: Round Robin
  • Variable request times: Least Connections
  • Different server capacities: Weighted Round Robin
  • Need session persistence: Consistent Hashing + Redis sessions (avoid IP hash)

3. Monitor Everything

Key metrics:

  • Requests per second per server
  • Average response time per server
  • Error rate per server
  • Connection count per server

Use: Prometheus + Grafana, AWS CloudWatch, Datadog


4. Test Failure Scenarios

Simulate:

  • Kill a server (does traffic reroute?)
  • Slow down a server (does load balancer detect?)
  • Spike traffic (does autoscaling trigger?)

Real Example: Netflix Chaos Monkey (randomly kills production servers to test resilience)


Conclusion

Load balancing is the backbone of scalable systems:

  • Distributes load across multiple servers
  • Eliminates single point of failure
  • Enables horizontal scaling
  • Reduces latency (geographic routing)

Key takeaways:

  1. Use Layer 7 load balancing for content-based routing
  2. Least Connections algorithm for most cases
  3. Always implement health checks
  4. Go stateless (no sticky sessions if possible)
  5. Load balance at multiple layers (DNS, regional, local)

Remember: 10 servers without a load balancer = wasted money and poor performance. 10 servers with a load balancer = scalable, resilient system.

Master load balancing, and you'll never face the "one server at 95%, nine at 10%" problem again.


Cover image by Christophe Hautier on Unsplash

Support My Work

If this guide helped you learn something new, solve a problem, or ace your interviews, I'd really appreciate your support! Creating comprehensive, free content like this takes significant time and effort. Your support helps me continue sharing knowledge and creating more helpful resources for developers and students.

Buy me a Coffee

Every contribution, big or small, means the world to me and keeps me motivated to create more content!

Related Blogs

Ojaswi Athghara

SDE, 4+ Years

Β© ojaswiat.com 2025-2027