Background responses

Use background mode when a response may take a long time to complete, such as multi-step tool use or large generation tasks. The API returns immediately with a queued response, and you retrieve the result later by polling or streaming.

Create a background response

Set background to true and stream to false. The API creates the response, starts processing it asynchronously, and returns immediately with status: "queued".

mka1 llm responses create \
  --background \
  --model auto \
  --input '"Write a 500-word essay about the history of the internet."' \
  -H 'X-On-Behalf-Of: <end-user-id>'

Save the id to retrieve the result later.

Poll for the result

Call GET /responses/{response_id} until the status reaches a terminal state.

mka1 llm responses get --response-id resp_abc123

A response moves through these statuses as it is processed:

Status	Meaning
`queued`	The request is waiting to be picked up
`in_progress`	The model is generating output
`completed`	Generation finished successfully
`failed`	An error occurred during processing
`incomplete`	The response was cut short (e.g. token limit)
`cancelled`	The response was cancelled before completing

Poll at a reasonable interval (for example, every two seconds) until the status is no longer queued or in_progress.

Stream events from a background response

If you want real-time updates instead of polling, retrieve the response with stream set to true. The API returns server-sent events as the response is processed.

mka1 llm responses get --response-id resp_abc123 --stream

Events arrive as they are produced. The stream closes after a terminal event such as response.completed or response.failed. If the response has already completed when you call this endpoint, you receive a single terminal event with the final response and the stream closes immediately.

Stream events at creation time

You can also stream events directly when creating a background response by setting both background and stream to true.

mka1 llm responses create \
  --background \
  --stream \
  --model auto \
  --input '"Write a 500-word essay about the history of the internet."'

The first event is response.queued, followed by response.created, intermediate events such as response.output_text.delta, and finally a terminal event like response.completed. This is useful when you want to show progress in a UI while the work runs in the background. If the client disconnects, the response continues processing and can be retrieved later.

Cancel a background response

If you no longer need the result, cancel a queued or in-progress response.

mka1 llm responses cancel --response-id resp_abc123

The response status changes to cancelled. Responses that have already completed or failed cannot be cancelled.

Next steps

See generate a response for the basics of creating responses
See manage agents when you want reusable agent definitions and persisted runs
Review the Responses API reference for the full list of parameters and response fields

Getting started

Responses

Features

CLI

Recipes

Benchmarks

Infrastructure

Background responses

Create a background response

Poll for the result

Stream events from a background response

Stream events at creation time

Cancel a background response

Next steps

Getting started

Responses

Features

CLI

Recipes

Benchmarks

Infrastructure

Documentation Index

​Create a background response

​Poll for the result

​Stream events from a background response

​Stream events at creation time

​Cancel a background response

​Next steps

Create a background response

Poll for the result

Stream events from a background response

Stream events at creation time

Cancel a background response

Next steps