A load balancer sits between users and servers, checks each request, then sends it to a healthy machine using a routing rule.
You’ve got users hitting an app, and you’ve got more than one server that can answer. Without a traffic “traffic cop,” you get spikes, uneven load, and one unlucky node that melts down while others idle.
A load balancer fixes that by taking incoming connections, picking a backend target, and repeating that choice at scale.
This article breaks down what’s happening on the wire, which parts matter most in production, and what to watch when things go sideways.
What A Load Balancer Is And Why Teams Use One
A load balancer is a network service (hardware, software, or managed cloud) that receives client traffic and forwards it to a pool of backends. Backends can be VMs, containers, bare-metal servers, functions, or other proxies.
People add load balancing for three plain reasons: spread traffic across many machines, keep the app reachable when a machine fails, and scale without changing client behavior.
It also gives you one stable “front door” endpoint. Clients connect to that endpoint, while the pool behind it can change hour to hour.
How Does Load Balancer Work? In Plain Terms
Start with the simplest mental model: the client connects to the load balancer, the load balancer selects a backend, then the backend returns a response. That sounds small, but the details decide whether your app feels snappy or flaky.
What Happens Step By Step
- Client opens a connection. The user’s device resolves DNS, then connects to the load balancer IP (or a set of IPs).
- The load balancer accepts the connection. A listener is waiting on a port and protocol (HTTP, HTTPS, TCP, UDP).
- Routing rules kick in. The balancer checks where the request should go: a target group, a service, a path-based rule, or a weighted split.
- Health status is checked. The balancer uses its last health results to avoid backends that are failing.
- A backend is chosen. The selection follows an algorithm like round robin or least connections.
- The request is forwarded. The balancer proxies the request or passes the connection through, depending on type.
- The response flows back. The backend replies, and the client receives the response through the balancer.
Two Connection Styles You’ll Hear About
Proxy mode: the balancer terminates the client connection, then opens a new connection to the backend. This is common for HTTP/HTTPS and gives the balancer visibility into requests.
Pass-through mode: the balancer keeps a more direct connection behavior, often at TCP/UDP level. This can be leaner, but it sees less of the request.
How A Load Balancer Decides Where Traffic Goes
The decision is a mix of rules and algorithms. Rules map requests into a set of eligible targets. Algorithms pick one target from that set.
Common Routing Rules
- Host-based: send
api.example.comto the API service andapp.example.comto the web service. - Path-based: send
/imagesto an image service and/checkoutto a payment service. - Weighted splits: send 90% to v1 and 10% to v2 during a rollout.
Common Selection Algorithms
- Round robin: cycle through targets in order.
- Weighted round robin: give stronger machines more traffic.
- Least connections: prefer the target with fewer active connections.
- Hash-based: pick a target by hashing a value like client IP, a header, or a session id.
Round robin is easy and works fine when backends are similar. Least connections is handy when requests have uneven duration. Hashing helps keep a user “stuck” to one backend when the app stores session state locally.
Layer 4 Vs Layer 7 Load Balancing
These labels describe what the balancer can “see.” Layer 4 is about TCP/UDP connections. Layer 7 is about application data like HTTP methods, paths, and headers.
Layer 4 In Practice
Layer 4 balancing makes choices using connection metadata: source IP, destination port, and protocol. It’s a solid fit for raw TCP services, databases that should not be proxied by an HTTP-aware layer, or traffic where you want minimal overhead.
Layer 7 In Practice
Layer 7 balancing can route based on the actual request. That means path routing, host routing, request rewrites, and cleaner troubleshooting. It also pairs well with TLS termination, since the balancer can decrypt and inspect traffic before forwarding it.
Health Checks: The Part That Keeps Outages Small
A load balancer is only as smart as its health checks. A “healthy” target should be able to do real work, not just accept a socket.
Most systems run periodic probes. If a target fails a number of checks in a row, it’s marked unhealthy and removed from routing. When it starts passing again, it’s added back.
What A Good Health Check Looks Like
- Fast: quick endpoint, low payload, short timeout.
- Meaningful: returns success only when core dependencies are usable.
- Stable: avoids random flaps from slow downstream calls.
- Separate from user traffic: avoid endpoints that need auth tokens that can expire unnoticed.
If you run a managed cloud balancer, the docs are plain about health checks and routing only to healthy targets. AWS describes this behavior in its explanation of How Elastic Load Balancing works.
Table: Core Load Balancer Parts And What They Do
The names vary across products, but the pieces below show up in most setups.
| Part | What It Does | What To Watch |
|---|---|---|
| Listener | Accepts connections on a port and protocol | Wrong protocol, missing HTTPS redirect, open ports |
| Routing rules | Maps a request into a backend group | Rule order, shadowed paths, surprising defaults |
| Target group | Pool of eligible backends | Stale targets, wrong ports, cross-zone choices |
| Health check | Detects failing targets | Flapping, too-strict checks, timeouts too low |
| Selection algorithm | Picks one target for each request or connection | Uneven weights, sticky sessions masking imbalance |
| Connection draining | Lets in-flight requests finish during scale-in | Cutting connections mid-request, long drain windows |
| TLS termination | Handles certificates and decrypts HTTPS | Old ciphers, cert rotation, missing HSTS |
| Access logs | Records request metadata for tracing issues | Sampling gaps, missing request IDs, storage cost |
| Rate controls | Limits abusive traffic bursts | False positives, uneven limits per route |
Session Stickiness And Why It Can Surprise You
Some apps store session state in memory on the backend. If traffic bounces across machines, users get logged out or carts disappear. Stickiness keeps a given user tied to one backend for a while.
Safer Patterns Than Long-Term Stickiness
- Store sessions in a shared store (Redis, database) so any backend can serve the user.
- Keep stickiness windows short, long enough for bursts, not long enough to lock users to a sick node.
- Prefer stateless tokens when the app can handle it.
TLS Termination, Headers, And The Real Client IP
When the load balancer terminates TLS, it decrypts HTTPS and forwards plain HTTP to the backend (often inside a private network). That makes certificate rotation and cipher policy a balancer concern, not an app concern.
One gotcha: your backend might see the balancer’s IP as the client. Most systems add a header like X-Forwarded-For that carries the original client IP. Your app and logs should trust that header only from known balancers.
Observability: Signals That Tell You The Balancer Is The Bottleneck
If you’re troubleshooting, don’t guess. Start with three buckets: balancer health, backend health, and client behavior.
Balancer-Side Metrics Worth Tracking
- Request rate and connection rate
- Latency at the balancer (p50, p95, p99)
- 4xx and 5xx rates, split by route and target group
- Healthy vs unhealthy target counts over time
- TLS handshake errors and certificate failures
Table: Choosing The Right Style For Your App
This isn’t about brand names. It’s about matching features to the shape of your traffic.
| Need | Better Fit | Reason |
|---|---|---|
| Path and host routing | Layer 7 balancer | It can read HTTP requests and apply rules |
| Raw TCP or UDP services | Layer 4 balancer | It routes connections with low overhead |
| One pool with mixed server sizes | Weighted algorithm | Stronger nodes get more traffic |
| Requests with uneven duration | Least connections | It avoids piling long requests on one node |
| Local session state on backends | Short stickiness window | It reduces cross-node session breaks |
| Zero-downtime deployments | Drain plus health checks | Old nodes finish work before removal |
| Public HTTPS with cert rotation | TLS termination at balancer | Central cert handling and policy control |
Common Failure Modes And How To Avoid Them
Load balancing sounds simple until a busy day arrives. These are the problems that show up most often.
Health Checks That Lie
If your health endpoint returns success while the app can’t reach its database, the balancer will keep sending traffic into a dead end. Build a health endpoint that tests the dependencies that must be up for real requests to work.
Sticky Sessions That Hide A Bad Node
When stickiness is long, a single degraded backend can keep hurting the same users for hours. Keep stickiness short, and alert on per-target error rates, not just global rates.
Timeout Mismatch
Your client, balancer, and backend each have timeouts. If the balancer times out at 30 seconds but your backend is allowed to run for 60, the balancer will cut the connection early and the backend will keep working on a request the client will never see.
No Plan For Deploys
During a deploy, instances restart, containers roll, ports change. If the balancer doesn’t drain connections and your health checks don’t reflect readiness, users get bursts of errors. Add a readiness endpoint, use draining, and delay routing until the app is ready.
A Practical Checklist Before You Ship
- Pick Layer 4 or Layer 7 based on what you need to route on.
- Set health checks that reflect real readiness, not just “process is running.”
- Choose an algorithm that matches request shape: round robin for uniform, least connections for uneven.
- Decide on stickiness only if your app needs it, then keep the window short.
- Align timeouts across client, balancer, and backend.
- Turn on logs and keep a request ID flowing through each hop.
- Test a failure: kill a backend and confirm traffic drains to healthy nodes.
Once you see the pieces as a flow—connection in, rules, health, selection, forward, response—you can reason about almost any production issue without guesswork. That’s the real payoff: not magic, just visibility and control over where traffic lands.
References & Sources
- Amazon Web Services (AWS).“How Elastic Load Balancing works.”Explains how a load balancer routes traffic to registered targets and uses health checks to send traffic only to healthy targets.
- NGINX.“Using NGINX as an HTTP load balancer.”Documents load balancing methods and configuration concepts used to distribute HTTP traffic across upstream servers.
