Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.mka1.com/llms.txt

Use this file to discover all available pages before exploring further.

The MKA1 API can return text, audio, and images. Text is the default output modality. Use modalities and audio to enable speech output, or add the image_generation tool to produce images.

Supported output types

ModalityHow to enableOutput format
TextDefault — no extra configoutput_text in response
Audio (speech)Set modalities: ["text", "audio"]Base64 audio + transcript
ImageAdd image_generation toolImage URL or base64

Generate audio (text-to-speech)

Request audio output by setting modalities to ["text", "audio"] and specifying a voice and format in the audio parameter. The response includes both the text transcript and base64-encoded audio data.

Audio configuration

ParameterOptionsDefault
voicealloy and other voice profilesalloy
formatwav, mp3, flac, opus, pcm16wav
Audio is synthesized at 24 kHz, 16-bit mono.
mka1 llm responses create \
  -H 'X-On-Behalf-Of: <end-user-id>' \
  --body '{
    "model": "auto",
    "input": "Say hello in a friendly way. Keep it very short.",
    "modalities": ["text", "audio"],
    "audio": { "voice": "alloy", "format": "wav" }
  }'
The response contains an output_audio item with the base64-encoded audio and a transcript of what was spoken:
{
  "status": "completed",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [
        { "type": "output_text", "text": "Hello!" }
      ]
    },
    {
      "type": "output_audio",
      "id": "audio_460caf1079b34fa0b4aa74448dff4ea7",
      "data": "<Base64-encoded WAV audio data>",
      "transcript": "Hi there!",
      "status": "completed"
    }
  ]
}
The data field contains the full audio file (268 KB in this example). The transcript field contains the text the model chose to speak — which may differ slightly from the text output.

Save audio to a file

# Generate audio and extract the base64 data, then decode to a file
mka1 llm responses create \
  --body '{
    "model": "auto",
    "input": "Read this sentence aloud: The quick brown fox jumps over the lazy dog.",
    "modalities": ["text", "audio"],
    "audio": { "voice": "alloy", "format": "mp3" }
  }' \
  --output-format json \
  --jq '.output[] | select(.type == "output_audio") | .data' | base64 -d > output.mp3

Supported languages

Audio output supports automatic language detection and 20+ languages including English, Chinese, Hindi, Spanish, Arabic, Bengali, Portuguese, Russian, Japanese, Punjabi, German, Korean, French, Turkish, Italian, Thai, Polish, Dutch, Indonesian, Vietnamese, and Urdu.

Generate images

Use the image_generation tool to create images from text prompts. The model interprets your message, generates a prompt for the image model, and returns the result.

Image generation models

ModelBest for
meetkai:flux-2-kleinFast generation, general purpose (default)
meetkai:z-image-turboHigh-quality, detailed images

Image generation options

ParameterOptionsDefault
size1024x1024, 1024x1536, 1536x1024, autoauto
qualitylow, medium, high, autoauto
output_formatpng, webp, jpegpng
backgroundtransparent, opaque, autoauto
mka1 llm responses create --body '{
  "model": "auto",
  "input": "Generate an image of a sunset over a mountain lake.",
  "tools": [
    {
      "type": "image_generation",
      "model": "auto",
      "quality": "high",
      "size": "1024x1024",
      "output_format": "png"
    }
  ]
}'
The response includes an image_generation_call item with the generated image URL and the revised prompt used by the image model:
{
  "status": "completed",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "I'll generate an image of a beautiful sunset over a mountain lake for you."
        }
      ]
    },
    {
      "type": "image_generation_call",
      "id": "ig_abc123",
      "status": "completed",
      "result": "<Generated Image URL>",
      "revised_prompt": "A breathtaking sunset over a pristine mountain lake, with golden and orange hues reflecting on the calm water surface. Snow-capped mountain peaks in the background, dramatic clouds in the sky with vibrant sunset colors of pink, purple, and orange.",
      "size": "auto",
      "quality": "auto",
      "output_format": "png"
    }
  ]
}
The result field contains a URL to the generated image. The revised_prompt shows the expanded prompt the image model used — the LLM enhances your brief instruction into a detailed image description.

Force image generation

Use tool_choice to ensure the model generates an image rather than responding with text only.
mka1 llm responses create --body '{
  "model": "auto",
  "input": "A red circle on a white background.",
  "tools": [{ "type": "image_generation" }],
  "tool_choice": { "type": "image_generation" }
}'

Image output structure

The response output array contains these items when an image is generated:
  1. function_call — the model’s call to the image generation tool with the refined prompt
  2. image_generation_call — the generation result with status: "completed" and result (image URL)
  3. function_call_output — the raw tool output containing the URL
  4. message — the model’s text response describing or referencing the image
Image URLs expire after 1 hour. Download or cache them if you need long-term access.

Standalone APIs

For direct access without going through the Responses API, MKA1 also provides standalone endpoints:

Text-to-speech API

mka1 llm speech speak \
  --text 'Hello, welcome to the MKA1 platform.' \
  --language en \
  --output-file output.wav

Images API

mka1 llm images create \
  --model auto \
  --prompt 'A futuristic city skyline at dusk' \
  --size 1024x1024 \
  --quality hd

Next steps