Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.mka1.com/llms.txt

Use this file to discover all available pages before exploring further.

Use background mode when a response may take a long time to complete, such as multi-step tool use or large generation tasks. The API returns immediately with a queued response, and you retrieve the result later by polling or streaming.

Create a background response

Set background to true and stream to false. The API creates the response, starts processing it asynchronously, and returns immediately with status: "queued".
mka1 llm responses create \
  --background \
  --model auto \
  --input '"Write a 500-word essay about the history of the internet."' \
  -H 'X-On-Behalf-Of: <end-user-id>'
Save the id to retrieve the result later.

Poll for the result

Call GET /responses/{response_id} until the status reaches a terminal state.
mka1 llm responses get --response-id resp_abc123
A response moves through these statuses as it is processed:
StatusMeaning
queuedThe request is waiting to be picked up
in_progressThe model is generating output
completedGeneration finished successfully
failedAn error occurred during processing
incompleteThe response was cut short (e.g. token limit)
cancelledThe response was cancelled before completing
Poll at a reasonable interval (for example, every two seconds) until the status is no longer queued or in_progress.

Stream events from a background response

If you want real-time updates instead of polling, retrieve the response with stream set to true. The API returns server-sent events as the response is processed.
mka1 llm responses get --response-id resp_abc123 --stream
Events arrive as they are produced. The stream closes after a terminal event such as response.completed or response.failed. If the response has already completed when you call this endpoint, you receive a single terminal event with the final response and the stream closes immediately.

Stream events at creation time

You can also stream events directly when creating a background response by setting both background and stream to true.
mka1 llm responses create \
  --background \
  --stream \
  --model auto \
  --input '"Write a 500-word essay about the history of the internet."'
The first event is response.queued, followed by response.created, intermediate events such as response.output_text.delta, and finally a terminal event like response.completed. This is useful when you want to show progress in a UI while the work runs in the background. If the client disconnects, the response continues processing and can be retrieved later.

Cancel a background response

If you no longer need the result, cancel a queued or in-progress response.
mka1 llm responses cancel --response-id resp_abc123
The response status changes to cancelled. Responses that have already completed or failed cannot be cancelled.

Next steps