How Does Load Balancer Work? | Traffic Flow Made Clear

A load balancer sits between users and servers, checks each request, then sends it to a healthy machine using a routing rule.

You’ve got users hitting an app, and you’ve got more than one server that can answer. Without a traffic “traffic cop,” you get spikes, uneven load, and one unlucky node that melts down while others idle.

A load balancer fixes that by taking incoming connections, picking a backend target, and repeating that choice at scale.

This article breaks down what’s happening on the wire, which parts matter most in production, and what to watch when things go sideways.

What A Load Balancer Is And Why Teams Use One

A load balancer is a network service (hardware, software, or managed cloud) that receives client traffic and forwards it to a pool of backends. Backends can be VMs, containers, bare-metal servers, functions, or other proxies.

People add load balancing for three plain reasons: spread traffic across many machines, keep the app reachable when a machine fails, and scale without changing client behavior.

It also gives you one stable “front door” endpoint. Clients connect to that endpoint, while the pool behind it can change hour to hour.

How Does Load Balancer Work? In Plain Terms

Start with the simplest mental model: the client connects to the load balancer, the load balancer selects a backend, then the backend returns a response. That sounds small, but the details decide whether your app feels snappy or flaky.

What Happens Step By Step

Client opens a connection. The user’s device resolves DNS, then connects to the load balancer IP (or a set of IPs).
The load balancer accepts the connection. A listener is waiting on a port and protocol (HTTP, HTTPS, TCP, UDP).
Routing rules kick in. The balancer checks where the request should go: a target group, a service, a path-based rule, or a weighted split.
Health status is checked. The balancer uses its last health results to avoid backends that are failing.
A backend is chosen. The selection follows an algorithm like round robin or least connections.
The request is forwarded. The balancer proxies the request or passes the connection through, depending on type.
The response flows back. The backend replies, and the client receives the response through the balancer.

Two Connection Styles You’ll Hear About

Proxy mode: the balancer terminates the client connection, then opens a new connection to the backend. This is common for HTTP/HTTPS and gives the balancer visibility into requests.

Pass-through mode: the balancer keeps a more direct connection behavior, often at TCP/UDP level. This can be leaner, but it sees less of the request.

How A Load Balancer Decides Where Traffic Goes

The decision is a mix of rules and algorithms. Rules map requests into a set of eligible targets. Algorithms pick one target from that set.

Common Routing Rules

Host-based: send api.example.com to the API service and app.example.com to the web service.
Path-based: send /images to an image service and /checkout to a payment service.
Weighted splits: send 90% to v1 and 10% to v2 during a rollout.

Common Selection Algorithms

Round robin: cycle through targets in order.
Weighted round robin: give stronger machines more traffic.
Least connections: prefer the target with fewer active connections.
Hash-based: pick a target by hashing a value like client IP, a header, or a session id.

Round robin is easy and works fine when backends are similar. Least connections is handy when requests have uneven duration. Hashing helps keep a user “stuck” to one backend when the app stores session state locally.

Layer 4 Vs Layer 7 Load Balancing

These labels describe what the balancer can “see.” Layer 4 is about TCP/UDP connections. Layer 7 is about application data like HTTP methods, paths, and headers.

Layer 4 In Practice

Layer 4 balancing makes choices using connection metadata: source IP, destination port, and protocol. It’s a solid fit for raw TCP services, databases that should not be proxied by an HTTP-aware layer, or traffic where you want minimal overhead.

Layer 7 In Practice

Layer 7 balancing can route based on the actual request. That means path routing, host routing, request rewrites, and cleaner troubleshooting. It also pairs well with TLS termination, since the balancer can decrypt and inspect traffic before forwarding it.

Health Checks: The Part That Keeps Outages Small

A load balancer is only as smart as its health checks. A “healthy” target should be able to do real work, not just accept a socket.

Most systems run periodic probes. If a target fails a number of checks in a row, it’s marked unhealthy and removed from routing. When it starts passing again, it’s added back.

What A Good Health Check Looks Like

Fast: quick endpoint, low payload, short timeout.
Meaningful: returns success only when core dependencies are usable.
Stable: avoids random flaps from slow downstream calls.
Separate from user traffic: avoid endpoints that need auth tokens that can expire unnoticed.

If you run a managed cloud balancer, the docs are plain about health checks and routing only to healthy targets. AWS describes this behavior in its explanation of How Elastic Load Balancing works.

Table: Core Load Balancer Parts And What They Do

The names vary across products, but the pieces below show up in most setups.

Part	What It Does	What To Watch
Listener	Accepts connections on a port and protocol	Wrong protocol, missing HTTPS redirect, open ports
Routing rules	Maps a request into a backend group	Rule order, shadowed paths, surprising defaults
Target group	Pool of eligible backends	Stale targets, wrong ports, cross-zone choices
Health check	Detects failing targets	Flapping, too-strict checks, timeouts too low
Selection algorithm	Picks one target for each request or connection	Uneven weights, sticky sessions masking imbalance
Connection draining	Lets in-flight requests finish during scale-in	Cutting connections mid-request, long drain windows
TLS termination	Handles certificates and decrypts HTTPS	Old ciphers, cert rotation, missing HSTS
Access logs	Records request metadata for tracing issues	Sampling gaps, missing request IDs, storage cost
Rate controls	Limits abusive traffic bursts	False positives, uneven limits per route

Session Stickiness And Why It Can Surprise You

Some apps store session state in memory on the backend. If traffic bounces across machines, users get logged out or carts disappear. Stickiness keeps a given user tied to one backend for a while.

Safer Patterns Than Long-Term Stickiness

Store sessions in a shared store (Redis, database) so any backend can serve the user.
Keep stickiness windows short, long enough for bursts, not long enough to lock users to a sick node.
Prefer stateless tokens when the app can handle it.

TLS Termination, Headers, And The Real Client IP

When the load balancer terminates TLS, it decrypts HTTPS and forwards plain HTTP to the backend (often inside a private network). That makes certificate rotation and cipher policy a balancer concern, not an app concern.

One gotcha: your backend might see the balancer’s IP as the client. Most systems add a header like X-Forwarded-For that carries the original client IP. Your app and logs should trust that header only from known balancers.

Observability: Signals That Tell You The Balancer Is The Bottleneck

If you’re troubleshooting, don’t guess. Start with three buckets: balancer health, backend health, and client behavior.

Balancer-Side Metrics Worth Tracking

Request rate and connection rate
Latency at the balancer (p50, p95, p99)
4xx and 5xx rates, split by route and target group
Healthy vs unhealthy target counts over time
TLS handshake errors and certificate failures

Table: Choosing The Right Style For Your App

This isn’t about brand names. It’s about matching features to the shape of your traffic.

Need	Better Fit	Reason
Path and host routing	Layer 7 balancer	It can read HTTP requests and apply rules
Raw TCP or UDP services	Layer 4 balancer	It routes connections with low overhead
One pool with mixed server sizes	Weighted algorithm	Stronger nodes get more traffic
Requests with uneven duration	Least connections	It avoids piling long requests on one node
Local session state on backends	Short stickiness window	It reduces cross-node session breaks
Zero-downtime deployments	Drain plus health checks	Old nodes finish work before removal
Public HTTPS with cert rotation	TLS termination at balancer	Central cert handling and policy control

Common Failure Modes And How To Avoid Them

Load balancing sounds simple until a busy day arrives. These are the problems that show up most often.

Health Checks That Lie

If your health endpoint returns success while the app can’t reach its database, the balancer will keep sending traffic into a dead end. Build a health endpoint that tests the dependencies that must be up for real requests to work.

Sticky Sessions That Hide A Bad Node

When stickiness is long, a single degraded backend can keep hurting the same users for hours. Keep stickiness short, and alert on per-target error rates, not just global rates.

Timeout Mismatch

Your client, balancer, and backend each have timeouts. If the balancer times out at 30 seconds but your backend is allowed to run for 60, the balancer will cut the connection early and the backend will keep working on a request the client will never see.

No Plan For Deploys

During a deploy, instances restart, containers roll, ports change. If the balancer doesn’t drain connections and your health checks don’t reflect readiness, users get bursts of errors. Add a readiness endpoint, use draining, and delay routing until the app is ready.

A Practical Checklist Before You Ship

Pick Layer 4 or Layer 7 based on what you need to route on.
Set health checks that reflect real readiness, not just “process is running.”
Choose an algorithm that matches request shape: round robin for uniform, least connections for uneven.
Decide on stickiness only if your app needs it, then keep the window short.
Align timeouts across client, balancer, and backend.
Turn on logs and keep a request ID flowing through each hop.
Test a failure: kill a backend and confirm traffic drains to healthy nodes.

Once you see the pieces as a flow—connection in, rules, health, selection, forward, response—you can reason about almost any production issue without guesswork. That’s the real payoff: not magic, just visibility and control over where traffic lands.

References & Sources

Amazon Web Services (AWS).“How Elastic Load Balancing works.”Explains how a load balancer routes traffic to registered targets and uses health checks to send traffic only to healthy targets.
NGINX.“Using NGINX as an HTTP load balancer.”Documents load balancing methods and configuration concepts used to distribute HTTP traffic across upstream servers.