You order a custom cake. The bakery doesn't make you stand at the counter for three hours while they bake it — they take your order, hand you a ticket number, and tell you to check back later. When you return and your number's up, you collect the cake.
The Asynchronous Request-Reply pattern is that bakery ticket, for APIs. Some operations are just too slow to finish inside a single HTTP request. Rather than holding the connection open and hoping it doesn't time out, you accept the request immediately, hand back a ticket, and let the caller check back for the result.
The problem
HTTP is built for quick, synchronous exchanges: send a request, get a response, done. But plenty of real work — generating a report, transcoding a video, running a big calculation — takes seconds or minutes. Try to do it inside one request and everything fights you: browsers and load balancers time out after 30–60 seconds, the connection ties up a server thread the whole time, and a network blip means the client has no idea whether the work finished or not.
You can't just make the slow thing fast. So instead of pretending the call is quick, you need a way to break the link between asking for the work and waiting for it to complete.
- ClientHolds a single request open, with no idea whether the slow work ever finished.
- Load BalancerCuts the connection after 30–60 seconds, so long operations never get to reply.
- APITies up a thread running the slow job inside one request, until the timeout kills it.
How it works
The flow has three steps. First, the client sends the request and the API accepts it immediately — it validates the input, drops a job onto a queue, and responds with 202 Accepted plus a status URL (the ticket). The connection closes in milliseconds.
Second, a background worker picks the job off the queue and does the heavy lifting at its own pace, completely decoupled from the original caller. Third, the client polls the status URL: it gets 200 OK with "still working" until the job finishes, at which point the status endpoint points to the finished result (often a 303 See Other redirect to the result resource). The diagram below traces a request being accepted, processed by the worker, and the client polling until the answer is ready.
- APIAccepts the request instantly, enqueues the job, and returns 202 with a status URL.
- QueueHolds the job so the slow work is decoupled from the original request.
- WorkerPicks the job off the queue and does the heavy processing at its own pace.
- Status / ResultThe endpoint the client polls; it serves the finished result once ready.
Make accepting a job idempotent. A client that times out before getting its 202 will retry — and you don't want to start the same expensive job twice. Key the request so a retry returns the existing job's status URL instead of creating a duplicate.
When to use it
Use this pattern whenever an operation is too slow to fit comfortably in a single request but the client still needs the eventual result — report generation, media processing, bulk imports, anything compute-heavy. It keeps your front-end connections short and snappy, and lets the background work scale independently behind a queue with competing consumers.
Skip it when the work is genuinely fast; the extra machinery and the polling protocol aren't worth it for a 50-millisecond query. And if the client can't poll — or you'd rather push the result the moment it's ready — consider webhooks, WebSockets, or server-sent events instead. Async request-reply is the simplest fit when the client is happy to come back and ask, "Is it done yet?"