Think of the fuse box in your home. When a circuit draws too much current, the breaker trips and cuts the power — not to annoy you, but to stop a small fault from burning the house down. Once you've fixed the problem, you flip it back on.
A software circuit breaker does the same job for calls between services. It sits in front of a dependency — a database, a payment API, another microservice — and watches for failures. When a downstream component starts misbehaving, the breaker trips, and your calls stop flowing to it until it looks healthy again.
The problem: a sick dependency drags everyone down
Suppose your checkout service calls a payment provider, and the provider goes slow — every request now hangs for 30 seconds before timing out. The naive instinct is to retry, but retries on a struggling service just pile on more load, pushing it from slow to fully down.
Worse, every hung request holds onto a thread and a connection while it waits. Under steady traffic those resources fill up fast, and soon your checkout service has nothing left to serve anyone — even requests that have nothing to do with payments. The failure has cascaded: one sick dependency took down a healthy service, which can in turn take down its callers, rippling across the whole system.
How it works
A circuit breaker is a small state machine wrapped around the risky call, and it lives in one of three states:
- Closed — the normal state. Requests flow straight through the breaker to the service, and the breaker quietly counts how many succeed and fail. As long as failures stay below the failure threshold, it does nothing but watch.
- Open — once failures cross that threshold, the breaker trips open. Now every call fails fast at the breaker itself, returning an error (or a fallback) instantly without ever touching the sick service. This is what protects your threads and gives the dependency breathing room to recover.
- Half-Open — after a reset timeout elapses, the breaker cautiously lets a single probe request through to test the waters. If that probe succeeds, the dependency looks healthy and the breaker snaps back to Closed. If it fails, the breaker trips Open again and waits out another timeout.
The animation below walks through exactly this lifecycle: healthy traffic in Closed, the service starting to fail until the breaker trips Open, the wait, the single Half-Open probe, and the return to Closed on success.
- ClientMakes calls to a downstream service through the breaker.
- Circuit BreakerA guard that watches for failures and trips open to stop calling a sick service.
- ServiceThe downstream dependency being called — it may be slow or failing.
Why failing fast is a feature, not a bug. A request that fails in a millisecond is far kinder than one that hangs for 30 seconds. The fast failure releases the thread and connection immediately, keeps the rest of your service responsive, and stops the back-pressure that turns one outage into many.
What to do when the breaker is open
An open breaker means don't call the dependency — but it doesn't have to mean return an error. The best breakers pair the trip with a fallback so the user still gets something useful.
Common fallbacks include serving a slightly stale cached value, returning a sensible default (an empty recommendations list instead of personalized ones), or degrading gracefully — showing the page without the optional widget that depends on the broken service. The goal is to contain the blast radius: a payment outage might block checkout, but it shouldn't take down browsing, search, or the rest of the site.
The trade-offs
A circuit breaker is only as good as its tuning, and tuning is genuinely tricky:
- Thresholds and timeouts — trip too eagerly and you get false trips, cutting off a dependency that was only briefly slow. Trip too reluctantly and the breaker never opens in time to help. These numbers depend on real traffic and usually need adjusting over time.
- Testing the failure paths — the open and fallback branches run rarely, so they're easy to get wrong and easy to forget. If your fallback code is broken, you only find out during an outage — the worst possible moment.
- Hidden state — a breaker adds behavior that's invisible in the happy path. Operators need metrics and alerts on trips, or a tripped breaker silently dropping traffic can be mistaken for the outage itself.
Don't let the fallback rot. Because the Open and Half-Open paths fire only during failures, they're the least-exercised code you own. Deliberately test them — fault injection, chaos drills, or a forced-open switch — so you discover a broken fallback in a drill, not in a real incident.
When to reach for it
Reach for a circuit breaker whenever one service calls another over the network and a failure there could pile up and cascade — especially for remote calls that can hang, like third-party APIs and inter-service requests. It's most valuable in front of dependencies that are slow to fail and that you call often.
It complements other resilience tools rather than replacing them. A load balancer routes around a single dead instance, while a circuit breaker protects you when an entire dependency is unhealthy — and a cache gives you a ready-made fallback to serve while the breaker is open. Used together, a brief outage downstream becomes a graceful degradation instead of a system-wide failure.