Rate Limiting

Imagine pouring a large jug of water into a funnel. Dump it all at once and it overflows everywhere; pour it at a controlled pace and every drop makes it through. When your application calls an external API — a payment provider, a mapping service, a partner feed — you're pouring requests into someone else's funnel, and they've decided exactly how fast it can drain.

The Rate Limiting pattern is how you do the controlled pour. Rather than firing requests the instant you have them and hoping the provider keeps up, you deliberately pace your own outbound traffic to stay within the limit the downstream service allows.

The problem

Almost every external service caps how often you may call it — say, 100 requests per second, or 10,000 per day. Exceed it and you don't just get a polite slowdown: you get rejected requests, 429 Too Many Requests errors, temporary bans, or even billing penalties. Your work fails not because anything was wrong with it, but because you sent it too fast.

This is easy to confuse with throttling, but the direction is opposite. Throttling is defensive on the inbound side — it protects your service from being overwhelmed by callers. Rate limiting is considerate on the outbound side — it protects a downstream service from being overwhelmed by you. Naively retrying rejected calls only makes it worse, hammering an already-saturated limiter and triggering ever-longer penalties.

Without rate limiting — flooding the funnel

unpaced flood → rejections

Burst of work

Downstream API (quota)

429 · throttled / banned

Bursts of outbound calls go straight to the downstream API at full speed. Past its quota it answers with 429s and bans — work fails not because it was wrong, but because it was sent too fast.

How it works

The classic mechanism is a token bucket. The bucket holds tokens and refills at exactly the rate the downstream service permits — say, 100 tokens per second. Every outbound call must spend a token. If tokens are available, the call goes immediately; if the bucket is empty, the call waits until a token refills rather than being fired off to fail.

The bucket's size sets how big a burst you can absorb before pacing kicks in, while the refill rate enforces the long-run average. Work that can't go out right now sits in a buffer until its turn, so a sudden spike of 1,000 requests drains out smoothly at the allowed pace instead of being rejected en masse. The diagram below shows requests arriving in bursts, being metered by the limiter, and leaving as a steady, compliant stream to the downstream service.

Rate Limiting — pour at a pace the funnel can take

paced to the limit

Burst of work

Token bucket

Downstream API (quota)

Bursts of outbound work meet a token bucket that releases calls at the allowed rate, so the downstream API receives a steady, quota-compliant stream instead of a flood.

Tip

Coordinate the bucket across instances. A per-process limiter is fine for one worker, but ten workers each pacing to the full limit will collectively blow past it tenfold. When you scale out, the token bucket usually needs to live in shared state (a cache like Redis) so the whole fleet shares one budget.

When to use it

Use rate limiting whenever you call a service that publishes a quota and you'd rather pace yourself than be cut off. It pairs naturally with queue-load-leveling: a queue absorbs the bursty work, and the rate limiter drains it at a sustainable speed. It also makes your retry logic far gentler — instead of retrying immediately into a wall, retries wait for the next available token, so you stop amplifying the very congestion you're trying to recover from.

Where it's overkill: low-volume calls that never approach any limit, or fire-and-forget traffic where the occasional rejection genuinely doesn't matter. But the moment you're doing bulk work against a metered API — sending notifications, syncing records, scraping a feed — pacing your outbound flow is the difference between steady throughput and a stream of rejections.

Rate Limiting

The problem

How it works

When to use it

Key takeaways

Keep going