Skip to main content
Use auto_routing when you want the gateway to choose between quantized, MoE, and dense variants based on request complexity. When sibling variants are not registered yet, use auto_routing_debug to verify the routing decision itself. The response metadata will include:
  • routed_model
  • auto_routing_debug
auto_routing_debug is a compact JSON string with the requested model, selected tier, reasoning effort, score, and reasons.

Run the validation

The production verification on March 31, 2026 used the Responses endpoint directly with a fixed model and an opt-in debug flag.
bash
curl https://apigw.mka1.com/api/v1/llm/responses \
  --request POST \
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer <mka1-api-key>' \
  --data '{
    "model": "meetkai:functionary-pt",
    "input": "Summarize this in one sentence: The quick brown fox jumps over the lazy dog.",
    "auto_routing": true,
    "auto_routing_debug": true,
    "max_output_tokens": 64
  }'
If the deployment is live, the response metadata includes a payload like this:
{
  "metadata": {
    "routed_model": "meetkai:functionary-pt",
    "auto_routing_debug": "{\"requested_model\":\"meetkai:functionary-pt\",\"routed_model\":\"meetkai:functionary-pt\",\"desired_tier\":\"quantized\",\"reasoning_effort\":\"minimal\",\"score\":-3,\"reasons\":[\"short prompt\",\"simple transform task\"]}"
  }
}
The routed_model may still equal the requested model if no compatible sibling family exists in production yet. That does not mean the heuristic failed. The proof is the desired_tier, reasoning_effort, score, and reasons in auto_routing_debug.

Test method

The production check used six Responses requests against https://apigw.mka1.com/api/v1/llm/responses. Each request set:
  • model: "meetkai:functionary-pt"
  • auto_routing: true
  • auto_routing_debug: true
The matrix covered:
  1. Short English transform prompt
  2. Medium structured-output prompt
  3. Long incident-analysis prompt
  4. Forced tool-use prompt
  5. Short Portuguese transform prompt
  6. Long Portuguese incident-analysis prompt
For each response, the validation recorded:
  • HTTP status
  • metadata.routed_model
  • parsed metadata.auto_routing_debug
  • effective reasoning.effort

Live production results

These were the observed production results from March 31, 2026 after PR 321 deployed:
Test caseExpected tierObserved tierReasoning effortScoreKey reasons
simple_summaryquantizedquantizedminimal-3short prompt, simple transform task
structured_analysis_jsonmoemoelow2structured output requested, short prompt, complex reasoning markers in prompt
complex_incident_plandensedensehigh6large max_output_tokens, long prompt, complex reasoning markers in prompt
forced_code_interpreterdensedensehigh5high-agency tools enabled, tool use required
portuguese_simple_transformquantizedquantizedminimal-3short prompt, simple transform task
portuguese_complex_incidentdensedensehigh5large max_output_tokens, complex reasoning markers in prompt
All six requests returned 200 OK. All six responses included auto_routing_debug. The observed tier matched the expected tier for every case.

Raw response excerpts

The examples below are adapted from the live production log.

Short transform prompt

Request:
{
  "model": "meetkai:functionary-pt",
  "input": "Summarize this in one sentence: The quick brown fox jumps over the lazy dog.",
  "auto_routing": true,
  "auto_routing_debug": true,
  "max_output_tokens": 64
}
Observed response excerpt:
{
  "name": "simple_summary",
  "httpStatus": 200,
  "ok": true,
  "reasoning": {
    "effort": "minimal",
    "summary": null
  },
  "debug": {
    "desired_tier": "quantized",
    "reasoning_effort": "minimal",
    "score": -3,
    "reasons": [
      "short prompt",
      "simple transform task"
    ]
  },
  "error": null
}

Structured-output prompt

Request:
{
  "model": "meetkai:functionary-pt",
  "input": "Analyze three customer complaints, identify the top themes, and return a compact JSON object with themes and recommended actions.",
  "auto_routing": true,
  "auto_routing_debug": true,
  "max_output_tokens": 256,
  "text": {
    "format": {
      "type": "json_object"
    }
  }
}
Observed response excerpt:
{
  "name": "structured_analysis_json",
  "httpStatus": 200,
  "ok": true,
  "reasoning": {
    "effort": "low",
    "summary": null
  },
  "debug": {
    "desired_tier": "moe",
    "reasoning_effort": "low",
    "score": 2,
    "reasons": [
      "structured output requested",
      "short prompt",
      "complex reasoning markers in prompt"
    ]
  },
  "error": null
}

Long incident-analysis prompt

Request:
{
  "model": "meetkai:functionary-pt",
  "input": "You are reviewing a TypeScript production incident in a request router. Investigate the root cause, identify failure modes, compare remediation trade-offs, and produce a step-by-step refactor and rollout plan. Consider error handling, retries, observability, regression risk, and operational safeguards. Explain the likely root cause, list the most important contributing factors, compare at least two possible fixes, and recommend one with justification. Also call out any assumptions, open questions, and follow-up verification steps. Keep the reasoning structured and explicit so an engineer can execute the plan safely. You are reviewing a TypeScript production incident in a request router. Investigate the root cause, identify failure modes, compare remediation trade-offs, and produce a step-by-step refactor and rollout plan. Consider error handling, retries, observability, regression risk, and operational safeguards. Explain the likely root cause, list the most important contributing factors, compare at least two possible fixes, and recommend one with justification. Also call out any assumptions, open questions, and follow-up verification steps. Keep the reasoning structured and explicit so an engineer can execute the plan safely. You are reviewing a TypeScript production incident in a request router. Investigate the root cause, identify failure modes, compare remediation trade-offs, and produce a step-by-step refactor and rollout plan. Consider error handling, retries, observability, regression risk, and operational safeguards. Explain the likely root cause, list the most important contributing factors, compare at least two possible fixes, and recommend one with justification. Also call out any assumptions, open questions, and follow-up verification steps. Keep the reasoning structured and explicit so an engineer can execute the plan safely.",
  "auto_routing": true,
  "auto_routing_debug": true,
  "max_output_tokens": 4000
}
Observed response excerpt:
{
  "name": "complex_incident_plan",
  "httpStatus": 200,
  "ok": true,
  "reasoning": {
    "effort": "high",
    "summary": null
  },
  "debug": {
    "desired_tier": "dense",
    "reasoning_effort": "high",
    "score": 6,
    "reasons": [
      "large max_output_tokens",
      "long prompt",
      "complex reasoning markers in prompt"
    ]
  },
  "error": null
}

Forced tool-use prompt

Request:
{
  "model": "meetkai:functionary-pt",
  "input": "Use the code interpreter to inspect this issue and report what it finds.",
  "auto_routing": true,
  "auto_routing_debug": true,
  "max_output_tokens": 256,
  "tools": [
    {
      "type": "code_interpreter",
      "container": {
        "type": "auto"
      }
    }
  ],
  "tool_choice": {
    "type": "code_interpreter"
  }
}
Observed response excerpt:
{
  "name": "forced_code_interpreter",
  "httpStatus": 200,
  "ok": true,
  "reasoning": {
    "effort": "high",
    "summary": null
  },
  "debug": {
    "desired_tier": "dense",
    "reasoning_effort": "high",
    "score": 5,
    "reasons": [
      "high-agency tools enabled",
      "tool use required"
    ]
  },
  "error": null
}

Portuguese transform prompt

Request:
{
  "model": "meetkai:functionary-pt",
  "input": "Traduza e resuma este texto em português em uma frase curta.",
  "auto_routing": true,
  "auto_routing_debug": true,
  "max_output_tokens": 64
}
Observed response excerpt:
{
  "name": "portuguese_simple_transform",
  "httpStatus": 200,
  "ok": true,
  "reasoning": {
    "effort": "minimal",
    "summary": null
  },
  "debug": {
    "desired_tier": "quantized",
    "reasoning_effort": "minimal",
    "score": -3,
    "reasons": [
      "short prompt",
      "simple transform task"
    ]
  },
  "error": null
}

Portuguese incident-analysis prompt

Request:
{
  "model": "meetkai:functionary-pt",
  "input": "Investigue a causa raiz deste incidente em um serviço TypeScript, analise os trade-offs entre correções possíveis e produza um plano passo a passo para corrigir, refatorar e implantar com segurança. Considere tratamento de erros, observabilidade, risco de regressão e validação operacional. Explique a provável causa raiz, liste os fatores contribuintes mais importantes, compare pelo menos duas alternativas de correção e recomende uma com justificativa. Também destaque premissas, dúvidas em aberto e etapas de verificação posteriores. Investigue a causa raiz deste incidente em um serviço TypeScript, analise os trade-offs entre correções possíveis e produza um plano passo a passo para corrigir, refatorar e implantar com segurança. Considere tratamento de erros, observabilidade, risco de regressão e validação operacional. Explique a provável causa raiz, liste os fatores contribuintes mais importantes, compare pelo menos duas alternativas de correção e recomende uma com justificativa. Também destaque premissas, dúvidas em aberto e etapas de verificação posteriores. Investigue a causa raiz deste incidente em um serviço TypeScript, analise os trade-offs entre correções possíveis e produza um plano passo a passo para corrigir, refatorar e implantar com segurança. Considere tratamento de erros, observabilidade, risco de regressão e validação operacional. Explique a provável causa raiz, liste os fatores contribuintes mais importantes, compare pelo menos duas alternativas de correção e recomende uma com justificativa. Também destaque premissas, dúvidas em aberto e etapas de verificação posteriores.",
  "auto_routing": true,
  "auto_routing_debug": true,
  "max_output_tokens": 4000
}
Observed response excerpt:
{
  "name": "portuguese_complex_incident",
  "httpStatus": 200,
  "ok": true,
  "reasoning": {
    "effort": "high",
    "summary": null
  },
  "debug": {
    "desired_tier": "dense",
    "reasoning_effort": "high",
    "score": 5,
    "reasons": [
      "large max_output_tokens",
      "complex reasoning markers in prompt"
    ]
  },
  "error": null
}

Interpret the result

Use this checklist when validating a deployment:
  1. Confirm the response includes metadata.auto_routing_debug.
  2. Parse the JSON string and inspect desired_tier.
  3. Check that reasoning_effort matches the expected complexity level.
  4. Check that the reasons match the prompt features you intended to trigger.
  5. If sibling variants exist, also confirm that routed_model changes to the expected sibling.
If auto_routing_debug is missing, the deployed API image likely does not yet include the feature.

Notes

  • auto_routing_debug is intended for validation and rollout checks. It is opt-in and should not be enabled by default for normal production traffic.
  • auto_routing_debug is currently available on the Responses API.
  • Portuguese prompt heuristics are included in the current production routing logic, so short transform prompts and complex incident-analysis prompts can be validated in both English and Portuguese.