Explainstuff.mebeta
All concepts
Scalabilitybeginner7 min

Load Balancing

How one entry point can spread traffic across many servers — and keep going when one of them dies.

Imagine a single web server handling every request to your app. It works fine on launch day. Then you get popular, traffic doubles, and that one server starts to sweat — requests queue up, latency climbs, and eventually it falls over. You could buy a beefier machine (vertical scaling), but there's always a ceiling, and a single machine is a single point of failure.

Load balancing takes the other path. Instead of one big server, you run several identical ones and put a traffic cop in front of them.

The problem: one server can't do it all

When all traffic funnels into a single server, two things go wrong as you grow:

  1. Capacity — one machine can only handle so many concurrent requests before it saturates CPU, memory, or network.
  2. Availability — if that machine restarts or crashes, your entire app is down.

Both problems have the same shape: you've put all your eggs in one basket.

Everything depends on one machine
request
Client
Client
Client
Client
Single Server
Every client hits the same server — it's both the bottleneck and the single point of failure.

How it works

A load balancer sits between clients and your servers. Clients connect to it — they never address the servers directly. For each incoming request, the balancer picks one server from a pool of identical ones and forwards the request there.

Because the servers are interchangeable, it doesn't matter which one handles any given request. Add three more servers and you've roughly tripled your capacity. This is horizontal scaling: you grow by adding machines, not by enlarging one.

How a load balancer spreads traffic
request
Clients
Load Balancer
Server 1
Server 2
Server 3
Database
Requests arrive and are spread across the pool.
Note

Identical servers are the key precondition. Load balancing only works cleanly when any server can handle any request. That usually means servers are stateless — they keep no per-user data in local memory. Anything that must persist (sessions, uploads) lives in a shared store like a database or cache.

Routing around failure

The load balancer continuously runs health checks — small periodic requests to each server ("are you alive?"). When a server stops responding, the balancer marks it unhealthy and simply stops sending it traffic. Users never notice; their requests quietly flow to the healthy servers.

When the sick server recovers and starts passing health checks again, it's added back to the rotation. This is what turns a pile of servers into a resilient system.

A failed server is routed around
request
Clients
Load Balancer
Server 1
Server 2
Server 3
Database
Server 2 fails its health check, so the balancer sends traffic only to the healthy servers.

How does it choose a server?

The routing algorithm decides which server gets each request:

  • Round-robin — hand requests out in a cycle: 1, 2, 3, 1, 2, 3… Simple and even when requests cost about the same.
  • Least connections — send the next request to whichever server is currently handling the fewest. Better when some requests are much heavier than others.
  • Hashing — derive the server from something stable, like the client's IP or a URL. The same input always maps to the same server, which is useful for cache locality.
Watch out

Sticky sessions pin a given user to the same server for their whole session. It's a quick fix when servers do hold local state — but it undermines even distribution and makes failures more disruptive (a dead server takes its users' sessions with it). Prefer stateless servers with a shared session store instead.

When to reach for it

Reach for a load balancer when you need to scale past one machine or you need redundancy so a single failure doesn't take you offline — which, in practice, is almost any production web service. It's one of the most common building blocks in system design, and it pairs naturally with techniques like caching (to cut work per request) and circuit breakers (to handle downstream failures gracefully).

Key takeaways

  • A load balancer is a single front door that distributes requests across a pool of identical servers.
  • It enables horizontal scaling: add more servers to handle more traffic, rather than buying a bigger one.
  • Health checks let it route around failed servers automatically, improving availability.
  • The routing algorithm (round-robin, least-connections, hashing) shapes how evenly work is spread.
  • Sticky sessions trade even distribution for keeping a user pinned to one server.

Keep going