modalities and audio to enable speech output, or add the image_generation tool to produce images.
Supported output types
| Modality | How to enable | Output format |
|---|---|---|
| Text | Default — no extra config | output_text in response |
| Audio (speech) | Set modalities: ["text", "audio"] | Base64 audio + transcript |
| Image | Add image_generation tool | Image URL or base64 |
Generate audio (text-to-speech)
Request audio output by settingmodalities to ["text", "audio"] and specifying a voice and format in the audio parameter. The response includes both the text transcript and base64-encoded audio data.
Audio configuration
| Parameter | Options | Default |
|---|---|---|
voice | alloy and other voice profiles | alloy |
format | wav, mp3, flac, opus, pcm16 | wav |
Save audio to a file
Supported languages
Audio output supports automatic language detection and 20+ languages including English, Chinese, Hindi, Spanish, Arabic, Bengali, Portuguese, Russian, Japanese, Punjabi, German, Korean, French, Turkish, Italian, Thai, Polish, Dutch, Indonesian, Vietnamese, and Urdu.Generate images
Use theimage_generation tool to create images from text prompts. The model interprets your message, generates a prompt for the image model, and returns the result.
Image generation models
| Model | Best for |
|---|---|
meetkai:flux-2-klein | Fast generation, general purpose (default) |
meetkai:z-image-turbo | High-quality, detailed images |
Image generation options
| Parameter | Options | Default |
|---|---|---|
size | 1024x1024, 1024x1536, 1536x1024, auto | auto |
quality | low, medium, high, auto | auto |
output_format | png, webp, jpeg | png |
background | transparent, opaque, auto | auto |
Force image generation
Usetool_choice to ensure the model generates an image rather than responding with text only.
Image output structure
The responseoutput array contains these items when an image is generated:
function_call— the model’s call to the image generation tool with the refined promptimage_generation_call— the generation result withstatus: "completed"andresult(image URL)function_call_output— the raw tool output containing the URLmessage— the model’s text response describing or referencing the image
Standalone APIs
For direct access without going through the Responses API, MKA1 also provides standalone endpoints:Text-to-speech API
Images API
Next steps
- Multimodal input — send images, audio, and documents to the model
- Advanced voice mode — real-time voice conversations with LiveKit
- Generate a response — text requests and multi-turn exchanges