Making deep research with subagents

Use this pattern when one model doing all the research in a single response starts to hit context limits. Instead of letting one response consume every web_search result directly, use a parent response to delegate focused research tasks to child responses. Each child performs its own searches and returns a compact memo. The parent only consumes those memos, which lets the overall workflow search much more aggressively without flooding the parent context. This is a specialized version of the general subagent pattern. Here the parent is a research orchestrator and the children are research workers. Deep research flow -> parent response plans the research -> parent calls spawn_subagent several times -> your app runs child responses with web_search -> children return compact research memos -> your app sends all child memos back as function_call_output -> parent resumes, decides whether to do another wave, then answers

Why use subagents for deep research

In a classic single-agent deep research loop, one response keeps calling web_search. That works until the response has seen too many search results and starts to run out of useful context budget. Subagents fix that by splitting the work:

The parent keeps the global plan and final synthesis.
Each child handles one focused research task and can do several searches of its own.
The parent sees only the child memo, not every raw search result.
The parent can delegate a second or third wave if the first wave is weak or conflicting.

Design the parent and child differently

The parent should not search directly. Its job is to decompose the request, launch multiple subagents in parallel, and decide whether another wave is needed. The children should search directly. Their job is to do the focused work, push past weak initial results, and return compact evidence for the parent. A good split looks like this:

Parent: planning, task decomposition, conflict resolution, final answer.
Child: query refinement, repeated search, source evaluation, memo writing.

Define the delegation tool

Keep the tool schema small. Pass only the child task, optional instructions, and an optional model override.

const spawnSubagentTool = {
  type: "function" as const,
  name: "spawn_subagent",
  description: "Delegate a focused research task to a child response and return a compact memo.",
  strict: true,
  parameters: {
    type: "object",
    properties: {
      task: { type: "string" },
      instructions: { type: "string" },
      model: { type: "string" },
    },
    required: ["task"],
    additionalProperties: false,
  },
};

type SpawnSubagentArgs = {
  task: string;
  instructions?: string;
  model?: string;
};

Give the parent an orchestration prompt

The parent prompt should force decomposition and breadth. For deep research, tell the parent to use several subagents by default and to launch another wave when the first one is weak.

const DEEP_RESEARCH_PARENT_INSTRUCTIONS = `
You are a deep research orchestrator.

You do not research directly.
Use the spawn_subagent tool to delegate focused research tasks to child responses.

Delegation rules:
- For straightforward factual lookups, launch at least 3 subagents before answering.
- For most research tasks, launch at least 4 subagents before answering.
- For broad, ambiguous, or high-stakes research, launch 5-8 subagents in waves.
- Launch multiple subagents in the same turn whenever their work can proceed independently.
- If the first wave is incomplete, repetitive, weak, or conflicting, launch another wave instead of guessing.

Delegation examples:
- One subagent for the most direct answer path.
- One subagent focused on primary or official sources.
- One subagent focused on corroboration or contradictions.
- Additional subagents for timeline, geography, stakeholder perspective, technical detail, or recent developments.

After all tool results return, synthesize the evidence into one answer for the user.
`;

Give the child a stricter research prompt

The child prompt should restore the search intensity that single-agent deep research often has naturally. Without this, a child may stop after one weak search and return a memo saying what it would do next.

const DEEP_RESEARCH_CHILD_INSTRUCTIONS = `
You are a specialist deep research subagent.

You have the web_search tool and should use it aggressively.
Focus only on the delegated task.

Search discipline:
- For non-trivial tasks, do at least 3 distinct web searches before returning.
- If the first search is weak, empty, generic, or mostly directory results, keep searching.
- Do not return "next steps". Execute those searches yourself before returning.
- Prefer primary, official, and otherwise authoritative sources.
- Treat directories and profile sites as leads to corroborate, not sole proof when a stronger source should exist.

Return a compact memo for another model:
1. Key findings
2. Evidence and sources
3. Open questions or caveats
`;

Run the child with `web_search`

Each child is just another Responses request. Give it the delegated task as input and the web_search tool.

import { SDK } from "@meetkai/mka1";

const sdk = new SDK({
  bearerAuth: process.env.MKA1_API_KEY!,
});

async function runResearchChild(args: SpawnSubagentArgs) {
  const child = await sdk.llm.responses.create({
    model: args.model ?? "auto",
    instructions: args.instructions ?? DEEP_RESEARCH_CHILD_INSTRUCTIONS,
    input: args.task,
    tools: [
      {
        type: "web_search",
        userLocation: { country: "pakistan" },
        searchContextSize: "medium",
      },
    ],
    tool_choice: "auto",
    parallel_tool_calls: true,
    max_tool_calls: 60,
    store: true,
  });

  return {
    response_id: child.id,
    output_text: child.outputText,
  };
}

Continue weak children instead of accepting them

For deep research, do not accept a child result just because the child returned text. If the child only searched once, produced a short memo, or said things like “next steps” or “I need to search more”, continue that same child with previous_response_id.

const MIN_CHILD_SEARCHES = 3;
const MAX_CHILD_PASSES = 3;
const WEAK_MEMO_PATTERNS = [
  /next steps/i,
  /let me try/i,
  /need to search/i,
  /should check/i,
  /didn't find/i,
  /no direct results/i,
];

function isWeakMemo(outputText: string) {
  const trimmed = outputText.trim();
  if (!trimmed) return true;
  if (trimmed.length < 250) return true;
  return WEAK_MEMO_PATTERNS.some((pattern) => pattern.test(trimmed));
}

function buildChildContinuationPrompt(args: {
  task: string;
  searchCount: number;
  previousOutput: string;
}) {
  return [
    `Continue the delegated research task: ${args.task}`,
    `You have only completed ${args.searchCount} web searches so far.`,
    `Do at least ${MIN_CHILD_SEARCHES} distinct web searches before returning unless you already have a direct answer from authoritative evidence.`,
    `Do not return "next steps" or say that more searching is needed. Execute the additional searches yourself.`,
    `Prioritize primary, official, and otherwise authoritative sources. Use directories only as leads to corroborate.`,
    ``,
    `Previous memo:`,
    args.previousOutput,
  ].join("\n");
}

async function runResearchChildWithContinuation(args: SpawnSubagentArgs) {
  let previous_response_id: string | undefined;
  let nextInput = args.task;
  let totalSearchCount = 0;

  for (let pass = 0; pass < MAX_CHILD_PASSES; pass += 1) {
    const child = await sdk.llm.responses.create({
      model: args.model ?? "auto",
      instructions: args.instructions ?? DEEP_RESEARCH_CHILD_INSTRUCTIONS,
      ...(previous_response_id
        ? { previous_response_id, input: nextInput }
        : { input: nextInput }),
      tools: [
        {
          type: "web_search",
          userLocation: { country: "pakistan" },
          searchContextSize: "medium",
        },
      ],
      tool_choice: "auto",
      parallel_tool_calls: true,
      max_tool_calls: 60,
      store: true,
    });

    const passSearchCount = countCompletedWebSearches(child.output);
    totalSearchCount += passSearchCount;

    if (totalSearchCount >= MIN_CHILD_SEARCHES && !isWeakMemo(child.outputText)) {
      return {
        response_id: child.id,
        output_text: child.outputText,
        search_count: totalSearchCount,
      };
    }

    previous_response_id = child.id;
    nextInput = buildChildContinuationPrompt({
      task: args.task,
      searchCount: totalSearchCount,
      previousOutput: child.outputText,
    });
  }

  throw new Error("Child response remained too weak after multiple research passes");
}

The exact countCompletedWebSearches helper depends on whether you are using streamed events or stored output items. The important idea is to count actual completed search operations, not just turns.

Run a parent wave, then fan in all child memos

The parent should be allowed to emit several spawn_subagent calls in one turn. Treat those tool calls as a batch. Start every child, wait for every child to finish, and then resume the parent once with all child outputs.

export async function runDeepResearch(input: string) {
  let response = await sdk.llm.responses.create({
    model: "meetkai:functionary-urdu-large",
    instructions: DEEP_RESEARCH_PARENT_INSTRUCTIONS,
    input,
    tools: [spawnSubagentTool],
    tool_choice: "auto",
    parallel_tool_calls: true,
    max_tool_calls: 18,
    store: true,
  });

  while (true) {
    const toolCalls = response.output.filter(
      (item): item is {
        type: "function_call";
        name: string;
        call_id: string;
        arguments: string;
      } => item.type === "function_call" && item.name === "spawn_subagent",
    );

    if (toolCalls.length === 0) {
      return response.outputText;
    }

    const toolOutputs = await Promise.all(
      toolCalls.map(async (toolCall) => {
        const args = JSON.parse(toolCall.arguments) as SpawnSubagentArgs;
        const childResult = await runResearchChildWithContinuation(args);

        return {
          type: "function_call_output" as const,
          call_id: toolCall.call_id,
          output: JSON.stringify(childResult),
        };
      }),
    );

    response = await sdk.llm.responses.create({
      model: response.model,
      previous_response_id: response.id,
      input: toolOutputs,
      tools: [spawnSubagentTool],
      parallel_tool_calls: true,
      max_tool_calls: 18,
      store: true,
    });
  }
}

This is the core deep research loop:

Parent creates a research plan.
Parent emits a batch of spawn_subagent calls.
Your app runs all children in parallel.
Weak children continue into another pass.
Your app sends all final child memos back together.
Parent either launches another wave or answers.

Stream and log the research process

Deep research is much easier to debug if you log the child search lifecycle. Useful logs include:

parent response created
each delegated task
each child search query
each child search result batch
each child memo preview
when a weak child is continued into another pass
parent resume with the final child batch

This lets you distinguish orchestration failures from research-quality failures. For example, if the system answers badly but the logs show only one child with one search, that is a prompt or continuation problem, not a transport problem.

Practical rules for deep research

Use parallel_tool_calls: true on the parent so it can launch several subagents in one turn.
Keep child outputs compact. The parent should receive evidence summaries, not raw research dumps.
Prefer several focused child tasks over one vague child task.
If a child mostly finds directories or generic landing pages, keep pushing that child instead of accepting the first memo.
Do not resume the parent early with only part of a child batch.
Keep store: true on while developing so you can inspect parent and child responses afterward.
If you are acting on behalf of an end user, send the same X-On-Behalf-Of value on parent and child requests.

Common pitfalls

Too few subagents: the parent synthesizes before the research has real breadth.
Weak children: a child returns after one weak search with “next steps” instead of doing more work.
Oversized child memos: the parent gains little because the child passes back too much raw context.
Early fan-in: resuming the parent before every child in a batch has finished makes the workflow less predictable.
Source quality drift: children may overuse directories, profile sites, and aggregators unless the prompt explicitly tells them to prioritize primary sources.

​Why use subagents for deep research

​Design the parent and child differently

​Define the delegation tool

​Give the parent an orchestration prompt

​Give the child a stricter research prompt

​Run the child with web_search

​Continue weak children instead of accepting them

​Run a parent wave, then fan in all child memos

​Stream and log the research process

​Practical rules for deep research

​Common pitfalls

​See also