Validate auto routing

Use auto_routing when you want the gateway to choose between quantized, MoE, and dense variants based on request complexity. When sibling variants are not registered yet, use auto_routing_debug to verify the routing decision itself. The response metadata will include:

routed_model
auto_routing_debug

auto_routing_debug is a compact JSON string with the requested model, selected tier, reasoning effort, score, and reasons.

Run the validation

The production verification on March 31, 2026 used the Responses endpoint directly with a fixed model and an opt-in debug flag.

bash

curl https://apigw.mka1.com/api/v1/llm/responses \
  --request POST \
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer <mka1-api-key>' \
  --data '{
    "model": "meetkai:functionary-pt",
    "input": "Summarize this in one sentence: The quick brown fox jumps over the lazy dog.",
    "auto_routing": true,
    "auto_routing_debug": true,
    "max_output_tokens": 64
  }'

If the deployment is live, the response metadata includes a payload like this:

{
  "metadata": {
    "routed_model": "meetkai:functionary-pt",
    "auto_routing_debug": "{\"requested_model\":\"meetkai:functionary-pt\",\"routed_model\":\"meetkai:functionary-pt\",\"desired_tier\":\"quantized\",\"reasoning_effort\":\"minimal\",\"score\":-3,\"reasons\":[\"short prompt\",\"simple transform task\"]}"
  }
}

The routed_model may still equal the requested model if no compatible sibling family exists in production yet. That does not mean the heuristic failed. The proof is the desired_tier, reasoning_effort, score, and reasons in auto_routing_debug.

Test method

The production check used six Responses requests against https://apigw.mka1.com/api/v1/llm/responses. Each request set:

model: "meetkai:functionary-pt"
auto_routing: true
auto_routing_debug: true

The matrix covered:

Short English transform prompt
Medium structured-output prompt
Long incident-analysis prompt
Forced tool-use prompt
Short Portuguese transform prompt
Long Portuguese incident-analysis prompt

For each response, the validation recorded:

HTTP status
metadata.routed_model
parsed metadata.auto_routing_debug
effective reasoning.effort

Live production results

These were the observed production results from March 31, 2026 after PR 321 deployed:

Test case	Expected tier	Observed tier	Reasoning effort	Score	Key reasons
`simple_summary`	`quantized`	`quantized`	`minimal`	`-3`	`short prompt`, `simple transform task`
`structured_analysis_json`	`moe`	`moe`	`low`	`2`	`structured output requested`, `short prompt`, `complex reasoning markers in prompt`
`complex_incident_plan`	`dense`	`dense`	`high`	`6`	`large max_output_tokens`, `long prompt`, `complex reasoning markers in prompt`
`forced_code_interpreter`	`dense`	`dense`	`high`	`5`	`high-agency tools enabled`, `tool use required`
`portuguese_simple_transform`	`quantized`	`quantized`	`minimal`	`-3`	`short prompt`, `simple transform task`
`portuguese_complex_incident`	`dense`	`dense`	`high`	`5`	`large max_output_tokens`, `complex reasoning markers in prompt`

All six requests returned 200 OK. All six responses included auto_routing_debug. The observed tier matched the expected tier for every case.

Raw response excerpts

The examples below are adapted from the live production log.

Short transform prompt

Request:

{
  "model": "meetkai:functionary-pt",
  "input": "Summarize this in one sentence: The quick brown fox jumps over the lazy dog.",
  "auto_routing": true,
  "auto_routing_debug": true,
  "max_output_tokens": 64
}

Observed response excerpt:

{
  "name": "simple_summary",
  "httpStatus": 200,
  "ok": true,
  "reasoning": {
    "effort": "minimal",
    "summary": null
  },
  "debug": {
    "desired_tier": "quantized",
    "reasoning_effort": "minimal",
    "score": -3,
    "reasons": [
      "short prompt",
      "simple transform task"
    ]
  },
  "error": null
}

Structured-output prompt

Request:

{
  "model": "meetkai:functionary-pt",
  "input": "Analyze three customer complaints, identify the top themes, and return a compact JSON object with themes and recommended actions.",
  "auto_routing": true,
  "auto_routing_debug": true,
  "max_output_tokens": 256,
  "text": {
    "format": {
      "type": "json_object"
    }
  }
}

Observed response excerpt:

{
  "name": "structured_analysis_json",
  "httpStatus": 200,
  "ok": true,
  "reasoning": {
    "effort": "low",
    "summary": null
  },
  "debug": {
    "desired_tier": "moe",
    "reasoning_effort": "low",
    "score": 2,
    "reasons": [
      "structured output requested",
      "short prompt",
      "complex reasoning markers in prompt"
    ]
  },
  "error": null
}

Long incident-analysis prompt

Request:

{
  "model": "meetkai:functionary-pt",
  "input": "You are reviewing a TypeScript production incident in a request router. Investigate the root cause, identify failure modes, compare remediation trade-offs, and produce a step-by-step refactor and rollout plan. Consider error handling, retries, observability, regression risk, and operational safeguards. Explain the likely root cause, list the most important contributing factors, compare at least two possible fixes, and recommend one with justification. Also call out any assumptions, open questions, and follow-up verification steps. Keep the reasoning structured and explicit so an engineer can execute the plan safely. You are reviewing a TypeScript production incident in a request router. Investigate the root cause, identify failure modes, compare remediation trade-offs, and produce a step-by-step refactor and rollout plan. Consider error handling, retries, observability, regression risk, and operational safeguards. Explain the likely root cause, list the most important contributing factors, compare at least two possible fixes, and recommend one with justification. Also call out any assumptions, open questions, and follow-up verification steps. Keep the reasoning structured and explicit so an engineer can execute the plan safely. You are reviewing a TypeScript production incident in a request router. Investigate the root cause, identify failure modes, compare remediation trade-offs, and produce a step-by-step refactor and rollout plan. Consider error handling, retries, observability, regression risk, and operational safeguards. Explain the likely root cause, list the most important contributing factors, compare at least two possible fixes, and recommend one with justification. Also call out any assumptions, open questions, and follow-up verification steps. Keep the reasoning structured and explicit so an engineer can execute the plan safely.",
  "auto_routing": true,
  "auto_routing_debug": true,
  "max_output_tokens": 4000
}

Observed response excerpt:

{
  "name": "complex_incident_plan",
  "httpStatus": 200,
  "ok": true,
  "reasoning": {
    "effort": "high",
    "summary": null
  },
  "debug": {
    "desired_tier": "dense",
    "reasoning_effort": "high",
    "score": 6,
    "reasons": [
      "large max_output_tokens",
      "long prompt",
      "complex reasoning markers in prompt"
    ]
  },
  "error": null
}

Forced tool-use prompt

Request:

{
  "model": "meetkai:functionary-pt",
  "input": "Use the code interpreter to inspect this issue and report what it finds.",
  "auto_routing": true,
  "auto_routing_debug": true,
  "max_output_tokens": 256,
  "tools": [
    {
      "type": "code_interpreter",
      "container": {
        "type": "auto"
      }
    }
  ],
  "tool_choice": {
    "type": "code_interpreter"
  }
}

Observed response excerpt:

{
  "name": "forced_code_interpreter",
  "httpStatus": 200,
  "ok": true,
  "reasoning": {
    "effort": "high",
    "summary": null
  },
  "debug": {
    "desired_tier": "dense",
    "reasoning_effort": "high",
    "score": 5,
    "reasons": [
      "high-agency tools enabled",
      "tool use required"
    ]
  },
  "error": null
}

Portuguese transform prompt

Request:

{
  "model": "meetkai:functionary-pt",
  "input": "Traduza e resuma este texto em português em uma frase curta.",
  "auto_routing": true,
  "auto_routing_debug": true,
  "max_output_tokens": 64
}

Observed response excerpt:

{
  "name": "portuguese_simple_transform",
  "httpStatus": 200,
  "ok": true,
  "reasoning": {
    "effort": "minimal",
    "summary": null
  },
  "debug": {
    "desired_tier": "quantized",
    "reasoning_effort": "minimal",
    "score": -3,
    "reasons": [
      "short prompt",
      "simple transform task"
    ]
  },
  "error": null
}

Portuguese incident-analysis prompt

Request:

{
  "model": "meetkai:functionary-pt",
  "input": "Investigue a causa raiz deste incidente em um serviço TypeScript, analise os trade-offs entre correções possíveis e produza um plano passo a passo para corrigir, refatorar e implantar com segurança. Considere tratamento de erros, observabilidade, risco de regressão e validação operacional. Explique a provável causa raiz, liste os fatores contribuintes mais importantes, compare pelo menos duas alternativas de correção e recomende uma com justificativa. Também destaque premissas, dúvidas em aberto e etapas de verificação posteriores. Investigue a causa raiz deste incidente em um serviço TypeScript, analise os trade-offs entre correções possíveis e produza um plano passo a passo para corrigir, refatorar e implantar com segurança. Considere tratamento de erros, observabilidade, risco de regressão e validação operacional. Explique a provável causa raiz, liste os fatores contribuintes mais importantes, compare pelo menos duas alternativas de correção e recomende uma com justificativa. Também destaque premissas, dúvidas em aberto e etapas de verificação posteriores. Investigue a causa raiz deste incidente em um serviço TypeScript, analise os trade-offs entre correções possíveis e produza um plano passo a passo para corrigir, refatorar e implantar com segurança. Considere tratamento de erros, observabilidade, risco de regressão e validação operacional. Explique a provável causa raiz, liste os fatores contribuintes mais importantes, compare pelo menos duas alternativas de correção e recomende uma com justificativa. Também destaque premissas, dúvidas em aberto e etapas de verificação posteriores.",
  "auto_routing": true,
  "auto_routing_debug": true,
  "max_output_tokens": 4000
}

Observed response excerpt:

{
  "name": "portuguese_complex_incident",
  "httpStatus": 200,
  "ok": true,
  "reasoning": {
    "effort": "high",
    "summary": null
  },
  "debug": {
    "desired_tier": "dense",
    "reasoning_effort": "high",
    "score": 5,
    "reasons": [
      "large max_output_tokens",
      "complex reasoning markers in prompt"
    ]
  },
  "error": null
}

Interpret the result

Use this checklist when validating a deployment:

Confirm the response includes metadata.auto_routing_debug.
Parse the JSON string and inspect desired_tier.
Check that reasoning_effort matches the expected complexity level.
Check that the reasons match the prompt features you intended to trigger.
If sibling variants exist, also confirm that routed_model changes to the expected sibling.

If auto_routing_debug is missing, the deployed API image likely does not yet include the feature.

Notes

auto_routing_debug is intended for validation and rollout checks. It is opt-in and should not be enabled by default for normal production traffic.
auto_routing_debug is currently available on the Responses API.
Portuguese prompt heuristics are included in the current production routing logic, so short transform prompts and complex incident-analysis prompts can be validated in both English and Portuguese.

​Run the validation

​Test method

​Live production results

​Raw response excerpts

​Short transform prompt

​Structured-output prompt

​Long incident-analysis prompt

​Forced tool-use prompt

​Portuguese transform prompt

​Portuguese incident-analysis prompt

​Interpret the result

​Notes

Run the validation

Test method

Live production results

Raw response excerpts

Short transform prompt

Structured-output prompt

Long incident-analysis prompt

Forced tool-use prompt

Portuguese transform prompt

Portuguese incident-analysis prompt

Interpret the result

Notes