Skip to main content
Use this pattern when you want one response to break work into smaller tasks. This pattern implements a synchronous child-response delegation loop. A parent response calls a function tool that your app resolves by running another Responses request. One child result or many child results are returned to the parent as function_call_output, and the parent continues generation. A child response is sometimes called a subagent. In this guide, it is just another Responses request that performs a focused task. Parent response -> function_call spawn_subagent -> your app runs one or more child responses -> child results returned as function_call_output -> parent response resumes generation

How the loop works

In this pattern the parent response pauses whenever it calls spawn_subagent. Your app executes the delegated tasks and then resumes the parent with the results.
  1. Create a parent response with a spawn_subagent function tool.
  2. When the model calls the tool, parse each tool call’s arguments.
  3. Run one or more child Responses requests to perform the delegated tasks.
  4. Wait for all child responses to finish.
  5. Return each child result as function_call_output using the matching call_id.
  6. Resume the parent response with previous_response_id.
  7. Repeat until the parent produces a normal assistant message.

Define a tool for delegation

Keep the tool focused. Pass only the fields the child response needs.
{
  "type": "function",
  "name": "spawn_subagent",
  "description": "Delegate a focused task to a child response and return the result.",
  "strict": true,
  "parameters": {
    "type": "object",
    "properties": {
      "task": {
        "type": "string",
        "description": "The task for the child response."
      },
      "instructions": {
        "type": "string",
        "description": "Optional instructions for the child response."
      },
      "model": {
        "type": "string",
        "description": "Optional model override for the child response."
      }
    },
    "required": ["task"],
    "additionalProperties": false
  }
}
Use tool_choice: "auto" when the model should decide when to delegate. Use tool_choice: "required" when every turn must go through a tool.

TypeScript SDK recipe

This example uses sdk.llm.responses.create for both the parent and child responses. It allows the parent to emit multiple spawn_subagent calls in one turn. Your app runs those child responses concurrently, waits for all of them to finish, and then resumes the parent once with every tool result. Keep each child result small so the parent can use it in the next turn without consuming too much context.
import { SDK } from "@meetkai/mka1";

const sdk = new SDK({
  bearerAuth: process.env.MKA1_API_KEY!,
});

const spawnSubagentTool = {
  type: "function" as const,
  name: "spawn_subagent",
  description: "Delegate a focused task to a child response and return the result.",
  strict: true,
  parameters: {
    type: "object",
    properties: {
      task: { type: "string" },
      instructions: { type: "string" },
      model: { type: "string" },
    },
    required: ["task"],
    additionalProperties: false,
  },
};

type SpawnSubagentArgs = {
  task: string;
  instructions?: string;
  model?: string;
};

async function runChildResponse(args: SpawnSubagentArgs) {
  const child = await sdk.llm.responses.create({
    model: args.model ?? "auto",
    instructions:
      args.instructions ??
      "You are a specialist assistant. Complete the task and return the result only.",
    input: args.task,
    store: true,
  });

  return {
    response_id: child.id,
    output_text: child.outputText,
  };
}

export async function runDelegatingAgent(input: string) {
  let response = await sdk.llm.responses.create({
    model: "auto",
    instructions:
      "You are an orchestrator. Use spawn_subagent for focused side tasks. You may call it multiple times in one turn when tasks can be parallelized. After all tool results return, answer the user directly.",
    input,
    tools: [spawnSubagentTool],
    tool_choice: "auto",
    parallel_tool_calls: true,
    max_tool_calls: 8,
    store: true,
  });

  while (true) {
    const toolCalls = response.output.filter(
      (item): item is {
        type: "function_call";
        name: string;
        call_id: string;
        arguments: string;
      } => item.type === "function_call" && item.name === "spawn_subagent",
    );

    if (toolCalls.length === 0) {
      return response.outputText;
    }

    const toolOutputs = await Promise.all(
      toolCalls.map(async (toolCall) => {
        const args = JSON.parse(toolCall.arguments) as SpawnSubagentArgs;
        const childResult = await runChildResponse(args);

        return {
          type: "function_call_output" as const,
          call_id: toolCall.call_id,
          output: JSON.stringify(childResult),
        };
      }),
    );

    response = await sdk.llm.responses.create({
      model: response.model,
      previous_response_id: response.id,
      input: toolOutputs,
      tools: [spawnSubagentTool],
      parallel_tool_calls: true,
      max_tool_calls: 8,
      store: true,
    });
  }
}

Fan out and wait for all child responses

When parallel_tool_calls is true, the parent can emit several function_call items in one turn. Treat that set of tool calls as a batch. Start every child response, wait for all of them to finish, and only then resume the parent. This creates a barrier:
  1. Parent response emits many function_call items.
  2. Your app starts many child responses.
  3. Your app waits for all child responses to complete.
  4. Your app sends all function_call_output items back in one follow-up request.
  5. Parent response continues with the full set of delegated results.
If you resume the parent early with only part of the batch, the model continues without the missing results. That usually makes the orchestration less predictable. The key fan-in step looks like this:
const toolOutputs = await Promise.all(
  toolCalls.map(async (toolCall) => {
    const args = JSON.parse(toolCall.arguments) as SpawnSubagentArgs;
    const childResult = await runChildResponse(args);

    return {
        type: "function_call_output" as const,
        call_id: toolCall.call_id,
        output: JSON.stringify(childResult),
    };
  }),
);
If your app acts for an end user, send the same X-On-Behalf-Of value on the parent and child requests. If a child response may take longer, you can set background: true on the child request and poll with sdk.llm.responses.get until it completes. When you fan out to multiple child responses, wait until every child result is available before you resume the parent response.

Raw Responses request for the resume step

The critical handoff is the follow-up request. You pass every tool result back in input and point to the earlier parent turn with previous_response_id.
bash
curl https://apigw.mka1.com/api/v1/llm/responses \
  --request POST \
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer <mka1-api-key>' \
  --header 'X-On-Behalf-Of: <end-user-id>' \
  --data '{
    "model": "auto",
    "previous_response_id": "resp_parent123",
    "input": [
      {
        "type": "function_call_output",
        "call_id": "call_abc123",
        "output": "{\"response_id\":\"resp_child456\",\"output_text\":\"Research complete. Recommend a staged rollout.\"}"
      },
      {
        "type": "function_call_output",
        "call_id": "call_def456",
        "output": "{\"response_id\":\"resp_child789\",\"output_text\":\"Draft complete. Headline: Faster rollout with lower risk.\"}"
      }
    ]
  }'
If you are not acting for an end user, omit X-On-Behalf-Of.

Basic error handling and recovery

In real systems, this loop will fail sometimes. Common failures include malformed tool arguments, child request timeouts, upstream 5xx errors, rate limits, and child outputs that are too weak to be useful. The safest pattern is:
  • Parse tool arguments defensively.
  • Retry transient child request failures a small number of times with backoff.
  • Return a structured failure payload to the parent instead of crashing the whole batch when one child fails.
  • Keep child results small and explicit so the parent can decide whether to continue, retry, or answer with partial results.
One simple approach is to wrap child execution and always return either a success object or an error object:
type ChildSuccess = {
  ok: true;
  response_id: string;
  output_text: string;
};

type ChildFailure = {
  ok: false;
  error_code: "invalid_arguments" | "child_request_failed";
  message: string;
};

type ChildResult = ChildSuccess | ChildFailure;

function sleep(ms: number) {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

async function runChildResponseWithRecovery(rawArgs: string): Promise<ChildResult> {
  let args: SpawnSubagentArgs;

  try {
    args = JSON.parse(rawArgs) as SpawnSubagentArgs;
    if (!args.task || typeof args.task !== "string") {
      return {
        ok: false,
        error_code: "invalid_arguments",
        message: "spawn_subagent requires a string task",
      };
    }
  } catch {
    return {
      ok: false,
      error_code: "invalid_arguments",
      message: "Could not parse spawn_subagent arguments as JSON",
    };
  }

  for (let attempt = 0; attempt < 3; attempt += 1) {
    try {
      const child = await sdk.llm.responses.create({
        model: args.model ?? "auto",
        instructions:
          args.instructions ??
          "You are a specialist assistant. Complete the task and return the result only.",
        input: args.task,
        store: true,
      });

      return {
        ok: true,
        response_id: child.id,
        output_text: child.outputText,
      };
    } catch (error) {
      const isLastAttempt = attempt === 2;
      if (isLastAttempt) {
        return {
          ok: false,
          error_code: "child_request_failed",
          message:
            error instanceof Error ? error.message : "Child response failed",
        };
      }

      await sleep(500 * (attempt + 1));
    }
  }

  return {
    ok: false,
    error_code: "child_request_failed",
    message: "Child response failed",
  };
}
Then fan the results back into the parent even if some children fail:
const toolOutputs = await Promise.all(
  toolCalls.map(async (toolCall) => {
    const childResult = await runChildResponseWithRecovery(toolCall.arguments);

    return {
      type: "function_call_output" as const,
      call_id: toolCall.call_id,
      output: JSON.stringify(childResult),
    };
  }),
);
This lets the parent see partial failures and continue reasoning. For example, the parent can decide to retry with a narrower task, launch a replacement subagent, or answer with the available successful child results while noting the gap. For long-running children, you can also combine this with background: true plus polling. The recovery logic stays the same: wait for the child to reach a terminal state, then return either a success payload or a structured failure payload to the parent.

Practical limits

  • Keep parallel_tool_calls set to true when the parent should be able to delegate several child responses in one turn.
  • Set parallel_tool_calls to false only when child responses share state or must run in order.
  • Set max_tool_calls so a parent response cannot loop forever.
  • Keep child outputs compact so the parent can incorporate the result without consuming too much context.
  • Keep store: true while you build the workflow so you can inspect parent and child responses later.
  • Use the Responses input items page in the API Reference when you need to debug the exact items sent back to the model.

See also

Review making deep research with subagents for a concrete research-oriented implementation of this pattern. Review generate a response for the base Responses request pattern. Use manage conversations when you want the parent or child workflow to keep durable state outside a single response chain.