Skip to main content
POST
/
api
/
v1
/
llm
/
responses
Python (SDK)
from mka1 import SDK


with SDK(
    bearer_auth="<YOUR_BEARER_TOKEN_HERE>",
) as sdk:

    res = sdk.llm.responses.create(model="meetkai:functionary-urdu-mini-pak", input="What is the capital of France?", stream=False, store=True, background=False, parallel_tool_calls=True, max_tool_calls=30, truncation="auto", service_tier="auto")

    with res as event_stream:
        for event in event_stream:
            # handle event
            print(event, flush=True)
{
  "id": "resp_abc123",
  "object": "response",
  "created_at": 1735689600,
  "completed_at": 1735689601,
  "status": "completed",
  "error": null,
  "incomplete_details": null,
  "background": false,
  "instructions": null,
  "max_output_tokens": null,
  "max_tool_calls": 30,
  "metadata": {},
  "model": "meetkai:functionary-urdu-mini-pak",
  "output": [
    {
      "type": "message",
      "id": "msg_abc123",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "The capital of France is Paris.",
          "annotations": []
        }
      ],
      "status": "completed"
    }
  ],
  "output_text": "The capital of France is Paris.",
  "parallel_tool_calls": true,
  "previous_response_id": null,
  "reasoning": {
    "effort": null,
    "summary": null
  },
  "service_tier": "auto",
  "store": true,
  "text": {
    "format": {
      "type": "text"
    },
    "verbosity": "medium"
  },
  "tool_choice": "auto",
  "tools": [],
  "truncation": "auto",
  "usage": {
    "input_tokens": 8,
    "input_tokens_details": {
      "cached_tokens": 0
    },
    "output_tokens": 7,
    "output_tokens_details": {
      "reasoning_tokens": 0
    },
    "total_tokens": 15
  },
  "user": null
}

Authorizations

Authorization
string
header
required

Gateway auth: send Authorization: Bearer <mka1-api-key>. For multi-user server-side integrations, you can also send X-On-Behalf-Of: <external-user-id>.

Body

application/json

Request schema for creating a new agent response. Configures the agent's model, input, tools, output format, and behavior. Supports both foreground (blocking) and background (asynchronous) execution with optional streaming.

model
string
required

The ID of the model to use for generating the response.

input

The user input to send to the agent. Can be a simple text string for basic queries, or an array of item objects containing text, images, files, or audio for multimodal interactions. This is the main content the agent will respond to.

instructions
string

System or developer instructions inserted into the model's context before user input. Use this to guide the agent's behavior, set personality, define constraints, or provide domain-specific knowledge. Acts as a persistent system message for this response.

conversation

The conversation that this response belongs to. Can be a conversation ID string or a conversation object. Used to maintain context and history across multiple agent interactions. Optional - omit for one-off interactions.

previous_response_id
string

The unique ID of a previous response to continue from. Used for multi-turn conversations to maintain context and history. The agent will have access to all previous interactions in the chain.

prompt
object

Reference to a prompt template to use for this response, along with variables to substitute. Allows using predefined, versioned prompt templates instead of inline instructions. The template system supports variable interpolation.

include
enum<string>[]

Additional fields to include in the response output. Allows requesting specific nested data like web search sources, code interpreter outputs, computer screenshots, file search results, input images, output logprobs, or reasoning content. These fields may have performance or cost implications.

Available options:
web_search_call.action.sources,
code_interpreter_call.outputs,
computer_call_output.output.image_url,
file_search_call.results,
message.input_image.image_url,
message.output_text.logprobs,
reasoning.encrypted_content
text
object

Configuration options for text output from the model. Includes response format (text, JSON object, JSON schema) and verbosity level (low, medium, high). Use this to control output structure and detail level.

stream
boolean
default:false

If set to true, the response data will be streamed using Server-Sent Events (SSE) for real-time updates as the agent generates the response. When false, the response is returned as a single complete object. Defaults to false.

stream_options
object

Additional options for configuring streaming behavior when stream is enabled.

store
boolean
default:true

Whether to store the generated response for later retrieval. When true, the response is saved and can be retrieved via GET /responses/{id}. When false, the response is not persisted after generation. Defaults to true.

background
boolean
default:false

Whether to run the model response in the background asynchronously. When true, the request returns immediately with a response ID while the agent processes in the background. Use GET /responses/{id} to retrieve results later. When false, the request blocks until completion. Defaults to false.

webhook_url
string<uri>

URL to receive webhook notifications for status changes (queued, in_progress, completed, failed, incomplete). Only valid when background=true. If provided without background=true, the request will be rejected.

webhook_secret
string

Optional secret for HMAC-SHA256 signing of webhook payloads. When provided, webhooks include an X-Webhook-Signature header with the signature. Requires webhook_url to be set.

Required string length: 16 - 256
tools
(Function tool definition · object | File search tool definition · object | Computer use tool definition · object | Web search tool definition · object | MCP tool definition · object | Code interpreter tool definition · object | Image generation tool definition · object | Local shell tool definition · object | Shell tool definition · object | Custom tool definition · object | Web search preview tool definition · object | Hosted tool definition · object | History tool definition · object)[]

Array of tool definitions that the agent can use during response generation. Supports built-in tools (web search, file search, code interpreter, computer use, image generation) and custom tools (functions, MCP, local shell). Each tool definition specifies its type and configuration.

Function tool definition: Defines a callable function tool. Provide a tool name and parameters schema, with optional description and deferred loading.

tool_choice

Controls how the agent selects which tools to use. Can be 'none' (no tools), 'auto' (model decides), 'required' (must use tools), or a specific tool selection object. Use this to force or prevent tool usage.

Available options:
none,
auto,
required
parallel_tool_calls
boolean
default:true

Whether to allow the agent to execute multiple tool calls in parallel. When true, the agent can make concurrent tool calls for efficiency. When false, tools are called sequentially. Defaults to true for better performance.

max_tool_calls
integer
default:30

The maximum total number of tool calls the agent can make during this response. Useful for controlling execution time and preventing infinite loops. Applies to all built-in tools like web search, file search, code interpreter, etc. Must be a positive integer between 1 and 300. Defaults to 30.

Required range: 1 <= x <= 300
reasoning
object

Configuration options for reasoning models. Controls reasoning effort level (minimal, low, medium, high) and summary verbosity (auto, concise, detailed). Only applicable to specific reasoning-capable models.

auto_routing
boolean

When true, the gateway inspects request complexity and automatically routes between quantized, MoE, and dense variants within the requested model family.

auto_routing_debug
boolean

Gateway extension. When true alongside auto_routing, the response metadata includes a compact serialized routing decision so you can inspect the inferred target tier even when no compatible sibling variant is available.

max_output_tokens
integer

The maximum number of tokens the model can generate in its output. Sets an upper bound to control costs and response length. The actual output may be shorter if the model finishes naturally. Must be a positive integer.

Required range: 1 <= x <= 9007199254740991
temperature
number

Controls randomness in the model's output. Higher values (e.g., 1.5-2.0) make output more random and creative, lower values (e.g., 0.0-0.5) make it more focused and deterministic. Must be between 0 and 2.

Required range: 0 <= x <= 2
top_p
number

Nucleus sampling parameter. The model considers only the tokens with top_p cumulative probability. Lower values (e.g., 0.1) make output more focused, higher values (e.g., 0.9) allow more diversity. Must be between 0 and 1. Alternative to temperature.

Required range: 0 <= x <= 1
top_logprobs
integer

The number of most likely tokens to return at each position along with their log probabilities. Must be between 0 and 20. Useful for understanding model confidence and exploring alternative outputs.

Required range: 0 <= x <= 20
truncation
enum<string>
default:auto

The truncation strategy for handling inputs that exceed the model's context window. 'auto' automatically truncates old messages to fit, 'disabled' returns an error if context is too long. Defaults to 'auto'.

Available options:
auto,
disabled
context_management
object[]

Context management strategies to apply during response generation. Supports compaction, which summarizes older conversation history when the context exceeds a token threshold.

Maximum array length: 1
presence_penalty
number

Penalizes new tokens based on whether they appear in the text so far. Higher values increase the model's likelihood to talk about new topics.

Required range: -2 <= x <= 2
frequency_penalty
number

Penalizes new tokens based on their frequency in the text so far. Higher values decrease the model's likelihood to repeat the same line verbatim.

Required range: -2 <= x <= 2
service_tier
enum<string>
default:auto

Specifies the processing tier for serving the request. 'auto' lets the system choose, 'default' uses standard processing, 'flex' allows flexible scheduling for lower priority, 'priority' provides faster processing. Defaults to 'auto'.

Available options:
auto,
default,
flex,
priority
prompt_cache_key
string

A stable identifier used for caching prompts and context. Helps reduce costs and latency by reusing cached prompt processing. Useful for prompts that are used repeatedly with minor variations.

safety_identifier
string

A stable identifier used to help detect and prevent policy violations across multiple requests. Helps the system identify patterns of abuse or misuse. Useful for compliance tracking and safety monitoring.

metadata
object
modalities
enum<string>[]

Output types that the model should generate. Defaults to ['text']. Include 'audio' to receive an audio rendering of the text response via TTS.

Available options:
text,
audio
audio
object

Audio output configuration. Optional when modalities includes 'audio'; defaults are applied if omitted.

user
string

DEPRECATED: A unique identifier for the end-user. Use safety_identifier for policy violation detection and prompt_cache_key for caching instead. This field is maintained for backwards compatibility.

Response

Successful response - returns either streaming events (SSE) or a complete response object (JSON) depending on the stream parameter

id
string
required
object
any
required
created_at
number
required
completed_at
number | null
required
status
enum<string>
required

The overall status of the response generation. 'completed' means successfully finished, 'failed' means error occurred, 'in_progress' means currently processing, 'cancelled' means user-cancelled, 'queued' means waiting to start, 'incomplete' means partial completion.

Available options:
completed,
failed,
in_progress,
cancelled,
queued,
incomplete
error
object
required
incomplete_details
object
required
background
boolean
required
instructions
required
max_output_tokens
integer | null
required
Required range: -9007199254740991 <= x <= 9007199254740991
Example:

null

max_tool_calls
integer | null
required
Required range: -9007199254740991 <= x <= 9007199254740991
metadata
object
required
model
string
required
output
(Input message item · object | Output message item · object | Output audio item · object | File search call item · object | Computer call item · object | Computer call output item · object | Web search call item · object | Function call item · object | Function call output item · object | Reasoning item · object | Image generation call item · object | Code interpreter call item · object | Local shell call item · object | Local shell call output item · object | Shell call item · object | Shell call output item · object | MCP list tools item · object | MCP approval request item · object | MCP approval response item · object | MCP call item · object | Custom tool call item · object | Custom tool call output item · object | Item reference item · object | Compaction item · object)[]
required

Input message item: A message with role and content. Use this for user, assistant, system, or developer turns in structured inputs.

parallel_tool_calls
boolean
required
previous_response_id
string | null
required
store
boolean
required
text
object
required
tool_choice
required

Tool choice mode: Selects how the model decides tool usage. Use none, auto, or required.

Available options:
none,
auto,
required
tools
(Function tool definition · object | File search tool definition · object | Computer use tool definition · object | Web search tool definition · object | MCP tool definition · object | Code interpreter tool definition · object | Image generation tool definition · object | Local shell tool definition · object | Shell tool definition · object | Custom tool definition · object | Web search preview tool definition · object | Hosted tool definition · object | History tool definition · object)[]
required

Function tool definition: Defines a callable function tool. Provide a tool name and parameters schema, with optional description and deferred loading.

truncation
enum<string>
required
Available options:
auto,
disabled
usage
object
required
user
string | null
required
conversation
object
output_text
string
prompt
object
prompt_cache_key
string
reasoning
object
safety_identifier
string
service_tier
enum<string>
Available options:
auto,
default,
flex,
priority
temperature
number
Required range: 0 <= x <= 2
presence_penalty
number
frequency_penalty
number
top_logprobs
integer
Required range: 0 <= x <= 20
top_p
number
Required range: 0 <= x <= 1
context_management
object[]