auto_routing when you want the gateway to choose between quantized, MoE, and dense variants based on request complexity.
When sibling variants are not registered yet, use auto_routing_debug to verify the routing decision itself.
The response metadata will include:
routed_modelauto_routing_debug
auto_routing_debug is a compact JSON string with the requested model, selected tier, reasoning effort, score, and reasons.
Run the validation
The production verification on March 31, 2026 used the Responses endpoint directly with a fixed model and an opt-in debug flag.bash
routed_model may still equal the requested model if no compatible sibling family exists in production yet.
That does not mean the heuristic failed.
The proof is the desired_tier, reasoning_effort, score, and reasons in auto_routing_debug.
Test method
The production check used six Responses requests againsthttps://apigw.mka1.com/api/v1/llm/responses.
Each request set:
model: "meetkai:functionary-pt"auto_routing: trueauto_routing_debug: true
- Short English transform prompt
- Medium structured-output prompt
- Long incident-analysis prompt
- Forced tool-use prompt
- Short Portuguese transform prompt
- Long Portuguese incident-analysis prompt
- HTTP status
metadata.routed_model- parsed
metadata.auto_routing_debug - effective
reasoning.effort
Live production results
These were the observed production results from March 31, 2026 after PR 321 deployed:| Test case | Expected tier | Observed tier | Reasoning effort | Score | Key reasons |
|---|---|---|---|---|---|
simple_summary | quantized | quantized | minimal | -3 | short prompt, simple transform task |
structured_analysis_json | moe | moe | low | 2 | structured output requested, short prompt, complex reasoning markers in prompt |
complex_incident_plan | dense | dense | high | 6 | large max_output_tokens, long prompt, complex reasoning markers in prompt |
forced_code_interpreter | dense | dense | high | 5 | high-agency tools enabled, tool use required |
portuguese_simple_transform | quantized | quantized | minimal | -3 | short prompt, simple transform task |
portuguese_complex_incident | dense | dense | high | 5 | large max_output_tokens, complex reasoning markers in prompt |
200 OK.
All six responses included auto_routing_debug.
The observed tier matched the expected tier for every case.
Raw response excerpts
The examples below are adapted from the live production log.Short transform prompt
Request:Structured-output prompt
Request:Long incident-analysis prompt
Request:Forced tool-use prompt
Request:Portuguese transform prompt
Request:Portuguese incident-analysis prompt
Request:Interpret the result
Use this checklist when validating a deployment:- Confirm the response includes
metadata.auto_routing_debug. - Parse the JSON string and inspect
desired_tier. - Check that
reasoning_effortmatches the expected complexity level. - Check that the
reasonsmatch the prompt features you intended to trigger. - If sibling variants exist, also confirm that
routed_modelchanges to the expected sibling.
auto_routing_debug is missing, the deployed API image likely does not yet include the feature.
Notes
auto_routing_debugis intended for validation and rollout checks. It is opt-in and should not be enabled by default for normal production traffic.auto_routing_debugis currently available on the Responses API.- Portuguese prompt heuristics are included in the current production routing logic, so short transform prompts and complex incident-analysis prompts can be validated in both English and Portuguese.