Explainstuff.mebeta
All concepts
Cloud Native Patternsintermediate6 min

Asynchronous Request-Reply

When the work takes too long for one HTTP call, accept the request fast, do the work in the background, and let the caller poll for the result.

You order a custom cake. The bakery doesn't make you stand at the counter for three hours while they bake it — they take your order, hand you a ticket number, and tell you to check back later. When you return and your number's up, you collect the cake.

The Asynchronous Request-Reply pattern is that bakery ticket, for APIs. Some operations are just too slow to finish inside a single HTTP request. Rather than holding the connection open and hoping it doesn't time out, you accept the request immediately, hand back a ticket, and let the caller check back for the result.

The problem

HTTP is built for quick, synchronous exchanges: send a request, get a response, done. But plenty of real work — generating a report, transcoding a video, running a big calculation — takes seconds or minutes. Try to do it inside one request and everything fights you: browsers and load balancers time out after 30–60 seconds, the connection ties up a server thread the whole time, and a network blip means the client has no idea whether the work finished or not.

You can't just make the slow thing fast. So instead of pretending the call is quick, you need a way to break the link between asking for the work and waiting for it to complete.

One slow request that times out
one request held open → timeout
Client (waiting…)
Load Balancer (60s cap)
API doing slow work
Doing the heavy work inside a single synchronous call ties up a thread and trips the load balancer's timeout before the answer is ready.

How it works

The flow has three steps. First, the client sends the request and the API accepts it immediately — it validates the input, drops a job onto a queue, and responds with 202 Accepted plus a status URL (the ticket). The connection closes in milliseconds.

Second, a background worker picks the job off the queue and does the heavy lifting at its own pace, completely decoupled from the original caller. Third, the client polls the status URL: it gets 200 OK with "still working" until the job finishes, at which point the status endpoint points to the finished result (often a 303 See Other redirect to the result resource). The diagram below traces a request being accepted, processed by the worker, and the client polling until the answer is ready.

Accept fast, process later, poll for the answer
request / poll
Client
API (202 + status URL)
Queue
Worker
Status / Result
The API returns 202 with a status URL; a worker processes the job off a queue while the client polls until the result is ready.
Tip

Make accepting a job idempotent. A client that times out before getting its 202 will retry — and you don't want to start the same expensive job twice. Key the request so a retry returns the existing job's status URL instead of creating a duplicate.

When to use it

Use this pattern whenever an operation is too slow to fit comfortably in a single request but the client still needs the eventual result — report generation, media processing, bulk imports, anything compute-heavy. It keeps your front-end connections short and snappy, and lets the background work scale independently behind a queue with competing consumers.

Skip it when the work is genuinely fast; the extra machinery and the polling protocol aren't worth it for a 50-millisecond query. And if the client can't poll — or you'd rather push the result the moment it's ready — consider webhooks, WebSockets, or server-sent events instead. Async request-reply is the simplest fit when the client is happy to come back and ask, "Is it done yet?"

Key takeaways

  • Async request-reply decouples a slow operation from the HTTP request that triggers it.
  • The API accepts the request, returns 202 Accepted with a status URL, and processes the work in the background.
  • The client polls the status endpoint until the result is ready, then fetches it.
  • It keeps front-end connections short and lets the heavy work scale independently behind a queue.
  • The trade-off is more moving parts and a polling protocol the client must follow.

Keep going