modalities and audio to enable speech output, or add the image_generation tool to produce images.
Supported output types
| Modality | How to enable | Output format |
|---|---|---|
| Text | Default — no extra config | output_text in response |
| Audio (speech) | Set modalities: ["text", "audio"] | Base64 audio + transcript |
| Image | Add image_generation tool | Image URL or base64 |
Generate audio (text-to-speech)
Request audio output by settingmodalities to ["text", "audio"] and specifying a voice and format in the audio parameter. The response includes both the text transcript and base64-encoded audio data.
Audio configuration
| Parameter | Options | Default |
|---|---|---|
voice | alloy and other voice profiles | alloy |
format | wav, mp3, flac, opus, pcm16 | wav |
output_audio item with the base64-encoded audio and a transcript of what was spoken:
data field contains the full audio file (268 KB in this example). The transcript field contains the text the model chose to speak — which may differ slightly from the text output.
Save audio to a file
Supported languages
Audio output supports automatic language detection and 20+ languages including English, Chinese, Hindi, Spanish, Arabic, Bengali, Portuguese, Russian, Japanese, Punjabi, German, Korean, French, Turkish, Italian, Thai, Polish, Dutch, Indonesian, Vietnamese, and Urdu.Generate images
Use theimage_generation tool to create images from text prompts. The model interprets your message, generates a prompt for the image model, and returns the result.
Image generation models
| Model | Best for |
|---|---|
meetkai:flux-2-klein | Fast generation, general purpose (default) |
meetkai:z-image-turbo | High-quality, detailed images |
Image generation options
| Parameter | Options | Default |
|---|---|---|
size | 1024x1024, 1024x1536, 1536x1024, auto | auto |
quality | low, medium, high, auto | auto |
output_format | png, webp, jpeg | png |
background | transparent, opaque, auto | auto |
image_generation_call item with the generated image URL and the revised prompt used by the image model:
result field contains a URL to the generated image. The revised_prompt shows the expanded prompt the image model used — the LLM enhances your brief instruction into a detailed image description.
Force image generation
Usetool_choice to ensure the model generates an image rather than responding with text only.
Image output structure
The responseoutput array contains these items when an image is generated:
function_call— the model’s call to the image generation tool with the refined promptimage_generation_call— the generation result withstatus: "completed"andresult(image URL)function_call_output— the raw tool output containing the URLmessage— the model’s text response describing or referencing the image
Standalone APIs
For direct access without going through the Responses API, MKA1 also provides standalone endpoints:Text-to-speech API
Images API
Next steps
- Multimodal input — send images, audio, and documents to the model
- Speech — transcribe audio and generate speech with the standalone speech endpoints
- Advanced voice mode — real-time voice conversations with LiveKit
- Generate a response — text requests and multi-turn exchanges