Load Balancing Explained: How to Distribute Traffic at Scale

When a single server can't handle your traffic, you scale horizontally — adding more servers. But adding servers is only half the solution. You need something to distribute incoming requests across them. That's what a load balancer does.

What Is a Load Balancer?

A load balancer sits between your clients and your backend servers. It accepts incoming connections and routes each request to one of several available backend instances based on a chosen algorithm.

Client
  ↓
Load Balancer
  ├── App Server 1 (192.168.1.10:3000)
  ├── App Server 2 (192.168.1.11:3000)
  └── App Server 3 (192.168.1.12:3000)

The client is unaware which backend handled its request. From its perspective, there's one server at the load balancer's IP.

Layer 4 vs Layer 7 Load Balancing

Layer 4 (Transport Layer): Routes based on TCP/UDP connection data — source IP, destination port. Fast, but has no visibility into HTTP content.

Layer 7 (Application Layer): Routes based on HTTP data — URL path, headers, cookies, query parameters. Can make smart decisions like routing /api to an API cluster and / to a frontend cluster.

Most modern web applications use Layer 7 load balancing.

Common Load Balancing Algorithms

Round Robin

Requests are distributed to servers in sequence: 1 → 2 → 3 → 1 → 2 → 3.

upstream app_servers {
    server 192.168.1.10:3000;
    server 192.168.1.11:3000;
    server 192.168.1.12:3000;
}

Simple and effective when all servers have equal capacity.

Least Connections

Routes each new request to the server with the fewest active connections. Better for apps with variable-length requests.

upstream app_servers {
    least_conn;
    server 192.168.1.10:3000;
    server 192.168.1.11:3000;
    server 192.168.1.12:3000;
}

IP Hash

Routes requests from the same client IP to the same server. Useful for stateful apps that store session data in memory.

upstream app_servers {
    ip_hash;
    server 192.168.1.10:3000;
    server 192.168.1.11:3000;
}

Weighted Round Robin

Assigns weights to servers based on capacity. A server with weight 3 gets 3x the requests of one with weight 1.

upstream app_servers {
    server 192.168.1.10:3000 weight=3;
    server 192.168.1.11:3000 weight=1;
}

Health Checks

A load balancer must know when a backend goes down. Health checks do this by periodically sending requests to each server and removing unhealthy ones from rotation.

Passive health checks detect failures when real requests fail.

Active health checks proactively probe backends on a schedule.

upstream app_servers {
    server 192.168.1.10:3000;
    server 192.168.1.11:3000;

    # Nginx Plus feature — community version uses passive checks only
    # health_check interval=5s fails=2 passes=3;
}

With HAProxy (active health checks):

backend app_servers
    balance roundrobin
    option httpchk GET /health
    server app1 192.168.1.10:3000 check inter 5s fall 2 rise 3
    server app2 192.168.1.11:3000 check inter 5s fall 2 rise 3

Session Persistence (Sticky Sessions)

Some apps store session state locally, meaning the same user must always hit the same server. This is called sticky sessions or session affinity.

Better architectural pattern: store sessions in a shared store (Redis, database) so any server can handle any request. This makes your app truly stateless and horizontally scalable.

# Example: connect a Redis instance for session storage
# Session data is shared across all app instances
REDIS_URL=redis://redis-host:6379

DNS Load Balancing

A simpler (though less sophisticated) approach is DNS-based load balancing: your domain has multiple A records, each pointing to a different server. Clients get one of the IPs at random.

example.com  A  203.0.113.10
example.com  A  203.0.113.11
example.com  A  203.0.113.12

This has no health checks — if one server dies, some clients still get routed to it until TTL expires.

Global Load Balancing (GeoDNS)

For global applications, route users to the nearest data center using GeoDNS. Services like Cloudflare, AWS Route 53, and Fastly handle this at the DNS layer.

Load Balancing on PandaStack

When you deploy Docker containers on PandaStack and scale them, traffic is automatically distributed across container instances. You don't configure the load balancer — the platform handles it. Combined with automatic HTTPS for your custom domain, your app is ready to scale from day one. Manage everything at [dashboard.pandastack.io](https://dashboard.pandastack.io) and read more at [docs.pandastack.io](https://docs.pandastack.io).

Conclusion

Load balancing is the foundation of horizontal scalability. Understand your app's session management needs (stateless vs. stateful) before choosing an algorithm. Use active health checks to detect failures fast, and prefer stateless architectures to avoid sticky session complexity.

Load Balancing Explained: How to Distribute Traffic at Scale