> ## Documentation Index
> Fetch the complete documentation index at: https://docs.mka1.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Multimodal output

> Generate audio speech and images from the MKA1 API using the Responses resource.

The MKA1 API can return text, audio, and images. Text is the default output modality.
Use `modalities` and `audio` to enable speech output, or add the `image_generation` tool to produce images.

## Supported output types

| Modality       | How to enable                       | Output format             |
| -------------- | ----------------------------------- | ------------------------- |
| Text           | Default — no extra config           | `output_text` in response |
| Audio (speech) | Set `modalities: ["text", "audio"]` | Base64 audio + transcript |
| Image          | Add `image_generation` tool         | Image URL or base64       |

## Generate audio (text-to-speech)

Request audio output by setting `modalities` to `["text", "audio"]` and specifying a voice and format in the `audio` parameter. The response includes both the text transcript and base64-encoded audio data.

### Audio configuration

| Parameter | Options                               | Default |
| --------- | ------------------------------------- | ------- |
| `voice`   | `alloy` and other voice profiles      | `alloy` |
| `format`  | `wav`, `mp3`, `flac`, `opus`, `pcm16` | `wav`   |

Audio is synthesized at 24 kHz, 16-bit mono.

<CodeGroup>
  ```bash CLI theme={null}
  mka1 llm responses create \
    -H 'X-On-Behalf-Of: <end-user-id>' \
    --body '{
      "model": "auto",
      "input": "Say hello in a friendly way. Keep it very short.",
      "modalities": ["text", "audio"],
      "audio": { "voice": "alloy", "format": "wav" }
    }'
  ```

  ```ts MKA1 SDK theme={null}
  import { SDK } from '@meetkai/mka1';

  const mka1 = new SDK({
    bearerAuth: `Bearer ${YOUR_API_KEY}`,
  });

  const result = await mka1.llm.responses.create({
    model: 'auto',
    input: 'Say hello in a friendly way. Keep it very short.',
    modalities: ['text', 'audio'],
    audio: { voice: 'alloy', format: 'wav' },
  }, { headers: { 'X-On-Behalf-Of': '<end-user-id>' } });

  // The output includes an output_audio item with base64 data and a transcript
  ```

  ```ts OpenAI SDK theme={null}
  import OpenAI from 'openai';

  const openai = new OpenAI({
    apiKey: '<mka1-api-key>',
    baseURL: 'https://apigw.mka1.com/api/v1/llm/',
    defaultHeaders: { 'X-On-Behalf-Of': '<end-user-id>' },
  });

  const response = await openai.responses.create({
    model: 'auto',
    input: 'Say hello in a friendly way. Keep it very short.',
    modalities: ['text', 'audio'],
    audio: { voice: 'alloy', format: 'wav' },
    stream: false,
  });

  // Find the audio output
  const audioItem = response.output.find((item) => item.type === 'output_audio');
  // audioItem.data contains base64-encoded WAV
  // audioItem.transcript contains the spoken text
  ```

  ```csharp C# SDK theme={null}
  using MeetKai.MKA1;
  using MeetKai.MKA1.Types.Components;

  var sdk = new SDK(
      bearerAuth: $"Bearer {YOUR_API_KEY}",
      serverUrl: "https://apigw.mka1.com"
  );

  var result = await sdk.Llm.Responses.CreateAsync(new ResponsesCreateRequest()
  {
      Model = "auto",
      Input = ResponsesCreateRequestInput.CreateStr("Say hello in a friendly way. Keep it very short."),
      Modalities = new List<ResponsesCreateRequestModality>
      {
          ResponsesCreateRequestModality.Text,
          ResponsesCreateRequestModality.Audio,
      },
      Audio = new Audio()
      {
          Voice = "alloy",
          Format = ResponsesCreateRequestFormat.Wav,
      },
  });

  // The output includes an output_audio item with base64 data and a transcript
  ```

  ```python Python SDK theme={null}
  from mka1 import SDK

  sdk = SDK(bearer_auth="Bearer YOUR_API_KEY")

  result = sdk.llm.responses.create(
      model="auto",
      input="Say hello in a friendly way. Keep it very short.",
      modalities=["text", "audio"],
      audio={"voice": "alloy", "format": "wav"},
  )

  # The output includes an output_audio item with base64 data and a transcript
  ```

  ```bash bash theme={null}
  curl https://apigw.mka1.com/api/v1/llm/responses \
    --request POST \
    --header 'Content-Type: application/json' \
    --header 'Authorization: Bearer <mka1-api-key>' \
    --header 'X-On-Behalf-Of: <end-user-id>' \
    --data '{
      "model": "auto",
      "input": "Say hello in a friendly way. Keep it very short.",
      "modalities": ["text", "audio"],
      "audio": { "voice": "alloy", "format": "wav" }
    }'
  ```
</CodeGroup>

The response contains an `output_audio` item with the base64-encoded audio and a transcript of what was spoken:

```json theme={null}
{
  "status": "completed",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [
        { "type": "output_text", "text": "Hello!" }
      ]
    },
    {
      "type": "output_audio",
      "id": "audio_460caf1079b34fa0b4aa74448dff4ea7",
      "data": "<Base64-encoded WAV audio data>",
      "transcript": "Hi there!",
      "status": "completed"
    }
  ]
}
```

The `data` field contains the full audio file (268 KB in this example). The `transcript` field contains the text the model chose to speak — which may differ slightly from the text output.

### Save audio to a file

<CodeGroup>
  ```bash CLI theme={null}
  # Generate audio and extract the base64 data, then decode to a file
  mka1 llm responses create \
    --body '{
      "model": "auto",
      "input": "Read this sentence aloud: The quick brown fox jumps over the lazy dog.",
      "modalities": ["text", "audio"],
      "audio": { "voice": "alloy", "format": "mp3" }
    }' \
    --output-format json \
    --jq '.output[] | select(.type == "output_audio") | .data' | base64 -d > output.mp3
  ```

  ```ts MKA1 SDK theme={null}
  import { writeFileSync } from 'fs';

  const result = await mka1.llm.responses.create({
    model: 'auto',
    input: 'Read this sentence aloud: The quick brown fox jumps over the lazy dog.',
    modalities: ['text', 'audio'],
    audio: { voice: 'alloy', format: 'mp3' },
  });

  // Find the audio output in the response
  const audioItem = result.output.find((item) => item.type === 'output_audio');
  if (audioItem) {
    const audioBuffer = Buffer.from(audioItem.data, 'base64');
    writeFileSync('output.mp3', audioBuffer);
  }
  ```

  ```ts OpenAI SDK theme={null}
  import { writeFileSync } from 'fs';

  const response = await openai.responses.create({
    model: 'auto',
    input: 'Read this sentence aloud: The quick brown fox jumps over the lazy dog.',
    modalities: ['text', 'audio'],
    audio: { voice: 'alloy', format: 'mp3' },
    stream: false,
  });

  const audioItem = response.output.find((item) => item.type === 'output_audio');
  if (audioItem) {
    const audioBuffer = Buffer.from(audioItem.data, 'base64');
    writeFileSync('output.mp3', audioBuffer);
  }
  ```

  ```csharp C# SDK theme={null}
  var result = await sdk.Llm.Responses.CreateAsync(new ResponsesCreateRequest()
  {
      Model = "auto",
      Input = ResponsesCreateRequestInput.CreateStr(
          "Read this sentence aloud: The quick brown fox jumps over the lazy dog."),
      Modalities = new List<ResponsesCreateRequestModality>
      {
          ResponsesCreateRequestModality.Text,
          ResponsesCreateRequestModality.Audio,
      },
      Audio = new Audio()
      {
          Voice = "alloy",
          Format = ResponsesCreateRequestFormat.Mp3,
      },
  });

  // Save the audio output to a file
  // (iterate result.Output to find the output_audio item and decode its base64 data)
  ```

  ```python Python SDK theme={null}
  import base64

  result = sdk.llm.responses.create(
      model="auto",
      input="Read this sentence aloud: The quick brown fox jumps over the lazy dog.",
      modalities=["text", "audio"],
      audio={"voice": "alloy", "format": "mp3"},
  )

  # Find the audio output in the response
  for item in result.output:
      if item.type == "output_audio":
          audio_bytes = base64.b64decode(item.data)
          with open("output.mp3", "wb") as f:
              f.write(audio_bytes)
  ```

  ```bash bash theme={null}
  # Generate audio and extract the base64 data, then decode to a file
  curl -s https://apigw.mka1.com/api/v1/llm/responses \
    --request POST \
    --header 'Content-Type: application/json' \
    --header 'Authorization: Bearer <mka1-api-key>' \
    --data '{
      "model": "auto",
      "input": "Read this sentence aloud: The quick brown fox jumps over the lazy dog.",
      "modalities": ["text", "audio"],
      "audio": { "voice": "alloy", "format": "mp3" }
    }' | jq -r '.output[] | select(.type == "output_audio") | .data' | base64 -d > output.mp3
  ```
</CodeGroup>

### Supported languages

Audio output supports automatic language detection and 20+ languages including English, Chinese, Hindi, Spanish, Arabic, Bengali, Portuguese, Russian, Japanese, Punjabi, German, Korean, French, Turkish, Italian, Thai, Polish, Dutch, Indonesian, Vietnamese, and Urdu.

## Generate images

Use the `image_generation` tool to create images from text prompts. The model interprets your message, generates a prompt for the image model, and returns the result.

### Image generation models

| Model                   | Best for                                   |
| ----------------------- | ------------------------------------------ |
| `meetkai:flux-2-klein`  | Fast generation, general purpose (default) |
| `meetkai:z-image-turbo` | High-quality, detailed images              |

### Image generation options

| Parameter       | Options                                       | Default |
| --------------- | --------------------------------------------- | ------- |
| `size`          | `1024x1024`, `1024x1536`, `1536x1024`, `auto` | `auto`  |
| `quality`       | `low`, `medium`, `high`, `auto`               | `auto`  |
| `output_format` | `png`, `webp`, `jpeg`                         | `png`   |
| `background`    | `transparent`, `opaque`, `auto`               | `auto`  |

<CodeGroup>
  ```bash CLI theme={null}
  mka1 llm responses create --body '{
    "model": "auto",
    "input": "Generate an image of a sunset over a mountain lake.",
    "tools": [
      {
        "type": "image_generation",
        "model": "auto",
        "quality": "high",
        "size": "1024x1024",
        "output_format": "png"
      }
    ]
  }'
  ```

  ```ts MKA1 SDK theme={null}
  import { SDK } from '@meetkai/mka1';

  const mka1 = new SDK({
    bearerAuth: `Bearer ${YOUR_API_KEY}`,
  });

  const result = await mka1.llm.responses.create({
    model: 'auto',
    input: 'Generate an image of a sunset over a mountain lake.',
    tools: [
      {
        type: 'image_generation',
        model: 'auto',
        quality: 'high',
        size: '1024x1024',
        output_format: 'png',
      },
    ],
  }, { headers: { 'X-On-Behalf-Of': '<end-user-id>' } });

  // The output includes an image_generation_call item with a result URL
  const imageCall = result.output.find((item) => item.type === 'image_generation_call');
  console.log('Image URL:', imageCall?.result);
  ```

  ```ts OpenAI SDK theme={null}
  import OpenAI from 'openai';

  const openai = new OpenAI({
    apiKey: '<mka1-api-key>',
    baseURL: 'https://apigw.mka1.com/api/v1/llm/',
    defaultHeaders: { 'X-On-Behalf-Of': '<end-user-id>' },
  });

  const response = await openai.responses.create({
    model: 'auto',
    input: 'Generate an image of a sunset over a mountain lake.',
    tools: [{ type: 'image_generation' }],
    stream: false,
  });

  const imageCall = response.output.find((item) => item.type === 'image_generation_call');
  console.log('Image URL:', imageCall?.result);
  ```

  ```csharp C# SDK theme={null}
  var result = await sdk.Llm.Responses.CreateAsync(new ResponsesCreateRequest()
  {
      Model = "auto",
      Input = ResponsesCreateRequestInput.CreateStr(
          "Generate an image of a sunset over a mountain lake."),
      Tools = new List<ResponsesCreateRequestTool>
      {
          ResponsesCreateRequestTool.CreateImageGenerationToolDefinition(
              new ImageGenerationToolDefinition()
              {
                  Model = "meetkai:flux2-klein",
                  Quality = ImageGenerationToolDefinitionQuality.High,
                  Size = ImageGenerationToolDefinitionSize.OneThousandAndTwentyFourx1024,
                  OutputFormat = OutputFormat.Png,
              }
          ),
      },
  });

  // The output includes an image_generation_call item with a result URL
  ```

  ```python Python SDK theme={null}
  result = sdk.llm.responses.create(
      model="auto",
      input="Generate an image of a sunset over a mountain lake.",
      tools=[
          {
              "type": "image_generation",
              "model": "auto",
              "quality": "high",
              "size": "1024x1024",
              "output_format": "png",
          },
      ],
  )

  # The output includes an image_generation_call item with a result URL
  for item in result.output:
      if item.type == "image_generation_call":
          print("Image URL:", item.result)
  ```

  ```bash bash theme={null}
  curl https://apigw.mka1.com/api/v1/llm/responses \
    --request POST \
    --header 'Content-Type: application/json' \
    --header 'Authorization: Bearer <mka1-api-key>' \
    --header 'X-On-Behalf-Of: <end-user-id>' \
    --data '{
      "model": "auto",
      "input": "Generate an image of a sunset over a mountain lake.",
      "tools": [
        {
          "type": "image_generation",
          "model": "auto",
          "quality": "high",
          "size": "1024x1024",
          "output_format": "png"
        }
      ]
    }'
  ```
</CodeGroup>

The response includes an `image_generation_call` item with the generated image URL and the revised prompt used by the image model:

```json theme={null}
{
  "status": "completed",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "I'll generate an image of a beautiful sunset over a mountain lake for you."
        }
      ]
    },
    {
      "type": "image_generation_call",
      "id": "ig_abc123",
      "status": "completed",
      "result": "<Generated Image URL>",
      "revised_prompt": "A breathtaking sunset over a pristine mountain lake, with golden and orange hues reflecting on the calm water surface. Snow-capped mountain peaks in the background, dramatic clouds in the sky with vibrant sunset colors of pink, purple, and orange.",
      "size": "auto",
      "quality": "auto",
      "output_format": "png"
    }
  ]
}
```

The `result` field contains a URL to the generated image. The `revised_prompt` shows the expanded prompt the image model used — the LLM enhances your brief instruction into a detailed image description.

### Force image generation

Use `tool_choice` to ensure the model generates an image rather than responding with text only.

<CodeGroup>
  ```bash CLI theme={null}
  mka1 llm responses create --body '{
    "model": "auto",
    "input": "A red circle on a white background.",
    "tools": [{ "type": "image_generation" }],
    "tool_choice": { "type": "image_generation" }
  }'
  ```

  ```ts MKA1 SDK theme={null}
  const result = await mka1.llm.responses.create({
    model: 'auto',
    input: 'A red circle on a white background.',
    tools: [{ type: 'image_generation' }],
    toolChoice: { type: 'image_generation' },
  });
  ```

  ```ts OpenAI SDK theme={null}
  const response = await openai.responses.create({
    model: 'auto',
    input: 'A red circle on a white background.',
    tools: [{ type: 'image_generation' }],
    tool_choice: { type: 'image_generation' },
    stream: false,
  });
  ```

  ```csharp C# SDK theme={null}
  var result = await sdk.Llm.Responses.CreateAsync(new ResponsesCreateRequest()
  {
      Model = "auto",
      Input = ResponsesCreateRequestInput.CreateStr("A red circle on a white background."),
      Tools = new List<ResponsesCreateRequestTool>
      {
          ResponsesCreateRequestTool.CreateImageGenerationToolDefinition(
              new ImageGenerationToolDefinition()
          ),
      },
      ToolChoice = ToolChoice.CreateHostedToolChoice(new HostedToolChoice()
      {
          Type = HostedToolChoiceType.ImageGeneration,
      }),
  });
  ```

  ```python Python SDK theme={null}
  result = sdk.llm.responses.create(
      model="auto",
      input="A red circle on a white background.",
      tools=[{"type": "image_generation"}],
      tool_choice={"type": "image_generation"},
  )
  ```

  ```bash bash theme={null}
  curl https://apigw.mka1.com/api/v1/llm/responses \
    --request POST \
    --header 'Content-Type: application/json' \
    --header 'Authorization: Bearer <mka1-api-key>' \
    --data '{
      "model": "auto",
      "input": "A red circle on a white background.",
      "tools": [{ "type": "image_generation" }],
      "tool_choice": { "type": "image_generation" }
    }'
  ```
</CodeGroup>

### Image output structure

The response `output` array contains these items when an image is generated:

1. `function_call` — the model's call to the image generation tool with the refined prompt
2. `image_generation_call` — the generation result with `status: "completed"` and `result` (image URL)
3. `function_call_output` — the raw tool output containing the URL
4. `message` — the model's text response describing or referencing the image

Image URLs expire after 1 hour. Download or cache them if you need long-term access.

## Standalone APIs

For direct access without going through the Responses API, MKA1 also provides standalone endpoints:

### Text-to-speech API

<CodeGroup>
  ```bash CLI theme={null}
  mka1 llm speech speak \
    --text 'Hello, welcome to the MKA1 platform.' \
    --language en \
    --output-file output.wav
  ```

  ```ts MKA1 SDK theme={null}
  const ttsResult = await mka1.llm.speech.speak({
    text: 'Hello, welcome to the MKA1 platform.',
    language: 'en',
  });
  ```

  ```csharp C# SDK theme={null}
  var res = await sdk.Llm.Speech.SpeakAsync(new TextToSpeechRequest()
  {
      Text = "Hello, welcome to the MKA1 platform.",
  });
  ```

  ```python Python SDK theme={null}
  result = sdk.llm.speech.speak(
      text="Hello, welcome to the MKA1 platform.",
      language="en",
  )
  ```

  ```bash bash theme={null}
  curl https://apigw.mka1.com/api/v1/llm/speech/tts \
    --request POST \
    --header 'Content-Type: application/json' \
    --header 'Authorization: Bearer <mka1-api-key>' \
    --data '{
      "text": "Hello, welcome to the MKA1 platform.",
      "language": "en"
    }'
  ```
</CodeGroup>

### Images API

<CodeGroup>
  ```bash CLI theme={null}
  mka1 llm images create \
    --model auto \
    --prompt 'A futuristic city skyline at dusk' \
    --size 1024x1024 \
    --quality hd
  ```

  ```ts MKA1 SDK theme={null}
  const imageResult = await mka1.llm.images.generate({
    model: 'auto',
    prompt: 'A futuristic city skyline at dusk',
    size: '1024x1024',
    quality: 'hd',
  });
  ```

  ```csharp C# SDK theme={null}
  var imageResult = await sdk.Llm.Images.CreateAsync(new ImageGenerationRequest()
  {
      Model = "meetkai:z-image-turbo",
      Prompt = "A futuristic city skyline at dusk",
      Size = ImageGenerationRequestSize.OneThousandAndTwentyFourx1024,
      Quality = ImageGenerationRequestQuality.Hd,
  });
  ```

  ```python Python SDK theme={null}
  image_result = sdk.llm.images.create(
      model="auto",
      prompt="A futuristic city skyline at dusk",
      size="1024x1024",
      quality="hd",
  )
  ```

  ```bash bash theme={null}
  curl https://apigw.mka1.com/api/v1/llm/images/generations \
    --request POST \
    --header 'Content-Type: application/json' \
    --header 'Authorization: Bearer <mka1-api-key>' \
    --data '{
      "model": "auto",
      "prompt": "A futuristic city skyline at dusk",
      "size": "1024x1024",
      "quality": "hd"
    }'
  ```
</CodeGroup>

## Next steps

* [Multimodal input](/docs/multimodal-input) — send images, audio, and documents to the model
* [Speech](/docs/speech) — transcribe audio and generate speech with the standalone speech endpoints
* [Advanced voice mode](/docs/advanced-voice-mode) — real-time voice conversations with LiveKit
* [Generate a response](/docs/generate-a-response) — text requests and multi-turn exchanges
