Skip to main content
Use this pattern when you want one response to break work into smaller tasks. This pattern implements a synchronous child-response delegation loop. A parent response calls a function tool that your app resolves by running another Responses request. One child result or many child results are returned to the parent as function_call_output, and the parent continues generation. A child response is sometimes called a subagent. In this guide, it is just another Responses request that performs a focused task. Parent response -> function_call spawn_subagent -> your app runs one or more child responses -> child results returned as function_call_output -> parent response resumes generation

How the loop works

In this pattern the parent response pauses whenever it calls spawn_subagent. Your app executes the delegated tasks and then resumes the parent with the results.
  1. Create a parent response with a spawn_subagent function tool.
  2. When the model calls the tool, parse each tool call’s arguments.
  3. Run one or more child Responses requests to perform the delegated tasks.
  4. Wait for all child responses to finish.
  5. Return each child result as function_call_output using the matching call_id.
  6. Resume the parent response with previous_response_id.
  7. Repeat until the parent produces a normal assistant message.

Define a tool for delegation

Keep the tool focused. Pass only the fields the child response needs.
{
  "type": "function",
  "name": "spawn_subagent",
  "description": "Delegate a focused task to a child response and return the result.",
  "strict": true,
  "parameters": {
    "type": "object",
    "properties": {
      "task": {
        "type": "string",
        "description": "The task for the child response."
      },
      "instructions": {
        "type": "string",
        "description": "Optional instructions for the child response."
      },
      "model": {
        "type": "string",
        "description": "Optional model override for the child response."
      }
    },
    "required": ["task"],
    "additionalProperties": false
  }
}
Use tool_choice: "auto" when the model should decide when to delegate. Use tool_choice: "required" when every turn must go through a tool.

TypeScript SDK recipe

This example uses sdk.llm.responses.create for both the parent and child responses. It allows the parent to emit multiple spawn_subagent calls in one turn. Your app runs those child responses concurrently, waits for all of them to finish, and then resumes the parent once with every tool result. Keep each child result small so the parent can use it in the next turn without consuming too much context.
import { SDK } from "@meetkai/mka1";

const sdk = new SDK({
  bearerAuth: process.env.MKA1_API_KEY!,
});

const spawnSubagentTool = {
  type: "function" as const,
  name: "spawn_subagent",
  description: "Delegate a focused task to a child response and return the result.",
  strict: true,
  parameters: {
    type: "object",
    properties: {
      task: { type: "string" },
      instructions: { type: "string" },
      model: { type: "string" },
    },
    required: ["task"],
    additionalProperties: false,
  },
};

type SpawnSubagentArgs = {
  task: string;
  instructions?: string;
  model?: string;
};

async function runChildResponse(args: SpawnSubagentArgs) {
  const child = await sdk.llm.responses.create({
    model: args.model ?? "meetkai:functionary-urdu-mini-pak",
    instructions:
      args.instructions ??
      "You are a specialist assistant. Complete the task and return the result only.",
    input: args.task,
    store: true,
  });

  return {
    response_id: child.id,
    output_text: child.outputText,
  };
}

export async function runDelegatingAgent(input: string) {
  let response = await sdk.llm.responses.create({
    model: "meetkai:functionary-urdu-mini-pak",
    instructions:
      "You are an orchestrator. Use spawn_subagent for focused side tasks. You may call it multiple times in one turn when tasks can be parallelized. After all tool results return, answer the user directly.",
    input,
    tools: [spawnSubagentTool],
    tool_choice: "auto",
    parallel_tool_calls: true,
    max_tool_calls: 8,
    store: true,
  });

  while (true) {
    const toolCalls = response.output.filter(
      (item): item is {
        type: "function_call";
        name: string;
        call_id: string;
        arguments: string;
      } => item.type === "function_call" && item.name === "spawn_subagent",
    );

    if (toolCalls.length === 0) {
      return response.outputText;
    }

    const toolOutputs = await Promise.all(
      toolCalls.map(async (toolCall) => {
        const args = JSON.parse(toolCall.arguments) as SpawnSubagentArgs;
        const childResult = await runChildResponse(args);

        return {
          type: "function_call_output" as const,
          call_id: toolCall.call_id,
          output: JSON.stringify(childResult),
        };
      }),
    );

    response = await sdk.llm.responses.create({
      model: response.model,
      previous_response_id: response.id,
      input: toolOutputs,
      tools: [spawnSubagentTool],
      parallel_tool_calls: true,
      max_tool_calls: 8,
      store: true,
    });
  }
}

Fan out and wait for all child responses

When parallel_tool_calls is true, the parent can emit several function_call items in one turn. Treat that set of tool calls as a batch. Start every child response, wait for all of them to finish, and only then resume the parent. This creates a barrier:
  1. Parent response emits many function_call items.
  2. Your app starts many child responses.
  3. Your app waits for all child responses to complete.
  4. Your app sends all function_call_output items back in one follow-up request.
  5. Parent response continues with the full set of delegated results.
If you resume the parent early with only part of the batch, the model continues without the missing results. That usually makes the orchestration less predictable. The key fan-in step looks like this:
const toolOutputs = await Promise.all(
  toolCalls.map(async (toolCall) => {
    const args = JSON.parse(toolCall.arguments) as SpawnSubagentArgs;
    const childResult = await runChildResponse(args);

    return {
        type: "function_call_output" as const,
        call_id: toolCall.call_id,
        output: JSON.stringify(childResult),
    };
  }),
);
If your app acts for an end user, send the same X-On-Behalf-Of value on the parent and child requests. If a child response may take longer, you can set background: true on the child request and poll with sdk.llm.responses.get until it completes. When you fan out to multiple child responses, wait until every child result is available before you resume the parent response.

Raw Responses request for the resume step

The critical handoff is the follow-up request. You pass every tool result back in input and point to the earlier parent turn with previous_response_id.
curl https://apigw.mka1.com/api/v1/llm/responses \
  --request POST \
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer <mka1-api-key>' \
  --header 'X-On-Behalf-Of: <end-user-id>' \
  --data '{
    "model": "meetkai:functionary-urdu-mini-pak",
    "previous_response_id": "resp_parent123",
    "input": [
      {
        "type": "function_call_output",
        "call_id": "call_abc123",
        "output": "{\"response_id\":\"resp_child456\",\"output_text\":\"Research complete. Recommend a staged rollout.\"}"
      },
      {
        "type": "function_call_output",
        "call_id": "call_def456",
        "output": "{\"response_id\":\"resp_child789\",\"output_text\":\"Draft complete. Headline: Faster rollout with lower risk.\"}"
      }
    ]
  }'
If you are not acting for an end user, omit X-On-Behalf-Of.

Practical limits

  • Keep parallel_tool_calls set to true when the parent should be able to delegate several child responses in one turn.
  • Set parallel_tool_calls to false only when child responses share state or must run in order.
  • Set max_tool_calls so a parent response cannot loop forever.
  • Keep child outputs compact so the parent can incorporate the result without consuming too much context.
  • Keep store: true while you build the workflow so you can inspect parent and child responses later.
  • Use the Responses input items page in the API Reference when you need to debug the exact items sent back to the model.

See also

Review generate text for the base Responses request pattern. Use manage conversations when you want the parent or child workflow to keep durable state outside a single response chain.