Skip to main content
The MKA1 API provides a real-time voice interface through LiveKit. This guide covers how to obtain a room token, connect to a voice session, send audio and text input, and capture the agent’s responses.

Overview

The voice integration consists of three main components:
  1. Room Token: A JWT that grants access to a LiveKit room
  2. LiveKit Connection: WebRTC-based real-time communication
  3. Voice Agent: Processes audio/text input and generates spoken responses
The agent pipeline works as follows:
  • STT (Speech-to-Text): Audio is streamed via WebSocket at 16kHz and transcribed
  • LLM: Transcribed text is processed by the MKA1 Responses API
  • TTS (Text-to-Speech): LLM output is synthesized to audio at 24kHz
Every request the voice agent sends to the Responses API automatically includes "voice_mode": "true" in the request metadata. This lets you distinguish voice-originated responses from text-based ones when reviewing usage or response history.

Getting a room token

To start a voice session, first request a room token from the MKA1 API. The token endpoint requires an API key and optionally accepts X-On-Behalf-Of to identify end users. See Authentication for details.
import { SDK } from '@meetkai/mka1';

const mka1 = new SDK({
  bearerAuth: `Bearer ${YOUR_API_KEY}`,
});

const session = await mka1.llm.speech.livekitToken({
  llm: {
    model: 'meetkai:qwen3.5-35b-a3b',
    reasoning: { effort: 'none' }
  }
}, { headers: { 'X-On-Behalf-Of': 'user-123' } }); // Optional

console.log(session.token);    // JWT token
console.log(session.url);      // WebSocket URL
console.log(session.roomName); // Room name

Parameters

The request body has two top-level objects:
{
  "llm": { ... },   // Required — LLM configuration
  "stt": { ... }    // Optional — speech-to-text tuning
}

llm — LLM configuration (required)

The llm object accepts the same fields as the Responses API request body, minus fields managed by the voice agent (input, stream, store, background).
FieldRequiredDescription
modelYesLLM model to use (e.g., meetkai:qwen3.5-35b-a3b)
instructionsNoCustom system instructions for the agent
previous_response_idNoChain this session to a specific response from a previous session
conversationNoContinue an existing conversation — pass { "id": "conv_abc123..." } or the conversation ID as a string
toolsNoArray of tool definitions (function, web_search, file_search, etc.)
tool_choiceNoHow the model selects tools ("auto", "none", "required", or a specific tool)
parallel_tool_callsNoWhether to allow parallel tool execution
max_tool_callsNoMaximum number of tool calls per response (default: 30)
temperatureNoSampling temperature (e.g., 0.7)
max_output_tokensNoMaximum tokens in the response
reasoningNoReasoning configuration (e.g., { "effort": "high" }). Set { "effort": "none" } for voice sessions to minimize latency — see note below.
top_pNoNucleus sampling parameter
presence_penaltyNoPresence penalty for token repetition
frequency_penaltyNoFrequency penalty for token repetition
truncationNo"auto" or "disabled" — controls context truncation
context_managementNoContext management strategies for conversation truncation
service_tierNo"auto", "default", "flex", or "priority"
promptNoReference to a prompt template and its variables
textNoText output configuration (format, verbosity)
metadataNoKey-value metadata passed to the Responses API
You cannot specify both previous_response_id and conversation.
The token metadata is embedded in a JWT, which is passed as an HTTP header. Keep the total llm payload under ~8 KB — large tools arrays may need to be trimmed.
For voice sessions, disable reasoning by setting "reasoning": { "effort": "none" }. Reasoning adds thinking time before the model responds, which increases latency and creates noticeable pauses in conversation. Disabling it keeps responses fast and natural.

stt — Speech-to-text configuration (optional)

Controls server-side voice activity detection (VAD) and endpointing behavior.
FieldRequiredDescription
silence_timeout_msNoMilliseconds of silence before finalizing speech (100–5000)
initial_silence_timeout_msNoTimeout before any speech is detected (1000–30000)

Advanced configuration

You can pass tools, custom instructions, and STT tuning in a single token request:
const session = await mka1.llm.speech.livekitToken({
  llm: {
    model: 'meetkai:qwen3.5-35b-a3b',
    instructions: 'You are a helpful travel assistant. Be concise in voice responses.',
    temperature: 0.7,
    tools: [
      {
        type: 'web_search',
        userLocation: { country: 'US' }
      },
      {
        type: 'function',
        name: 'book_flight',
        description: 'Book a flight for the user',
        parameters: {
          type: 'object',
          properties: {
            origin: { type: 'string' },
            destination: { type: 'string' },
            date: { type: 'string' }
          },
          required: ['origin', 'destination', 'date']
        }
      }
    ],
    toolChoice: 'auto'
  },
  stt: {
    silenceTimeoutMs: 500,
    initialSilenceTimeoutMs: 10000
  }
});

Response

{
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "url": "wss://apigw.mka1.com/api/v1/livekit",
  "roomName": "550e8400-e29b-41d4-a716-446655440000"
}
FieldDescription
tokenJWT access token (5-minute TTL) with room join, publish, and subscribe permissions
urlLiveKit WebSocket URL to connect to
roomNameAuto-generated UUID for this session
The token includes metadata that the voice agent uses to configure the session.

Continuing a session

To continue from a previous response:
const mka1 = new SDK({
  bearerAuth: `Bearer ${YOUR_API_KEY}`,
});

const session = await mka1.llm.speech.livekitToken({
  llm: {
    model: 'meetkai:qwen3.5-35b-a3b',
    previousResponseId: 'resp_abc123...'
  }
}, { headers: { 'X-On-Behalf-Of': 'user-123' } }); // Optional, but must match if the original session used it
To continue an existing conversation:
const mka1 = new SDK({
  bearerAuth: `Bearer ${YOUR_API_KEY}`,
});

const session = await mka1.llm.speech.livekitToken({
  llm: {
    model: 'meetkai:qwen3.5-35b-a3b',
    conversation: { id: 'conv_abc123...' }
  }
}, { headers: { 'X-On-Behalf-Of': 'user-123' } }); // Optional, but must match if the original session used it
When continuing a session, the API key and X-On-Behalf-Of header (if used) must match the original session. The voice agent encrypts both into the room token and passes them to all downstream MKA1 services. If they don’t match, the agent will not have access to the previous context.

Connecting to a room

Once you have a token, use the LiveKit SDK to connect to the room.
import { Room, RoomEvent, Track } from 'livekit-client';

const room = new Room();

// Connect to the room
await room.connect(session.url, session.token);

console.log('Connected to room:', room.name);

Sending audio input

The agent accepts audio input via the LiveKit room’s audio track. The audio is processed at 16kHz sample rate.
import { createLocalAudioTrack } from 'livekit-client';

// Create a local audio track from the microphone
const audioTrack = await createLocalAudioTrack({
  echoCancellation: true,
  noiseSuppression: true,
  autoGainControl: true
});

// Publish the track to the room
await room.localParticipant.publishTrack(audioTrack);

Audio behavior

  • Voice Activity Detection (VAD): VAD is handled server-side by the MKA1 agent, not locally. The agent automatically detects when you stop speaking and begins processing.
  • Sample rate: Audio is streamed at 16kHz to the STT service.
  • Endpointing: The agent uses server-side endpointing to determine when speech ends. There is no local endpointing delay.

Sending text input

You can also send text messages directly to the agent without speaking.
// Send a text message to the agent
const message = JSON.stringify({
  type: 'user_message',
  content: 'What is the capital of France?'
});

await room.localParticipant.publishData(
  new TextEncoder().encode(message),
  { reliable: true, topic: 'lk.chat' }
);

Receiving agent responses

The agent responds in three ways:
  1. Audio output: Synthesized speech via an audio track
  2. Transcription: Text of what the agent is saying (for captions)
  3. Response metadata: Response ID and conversation ID via data channel

Subscribing to audio output

import { RoomEvent, Track } from 'livekit-client';

room.on(RoomEvent.TrackSubscribed, (track, publication, participant) => {
  if (track.kind === Track.Kind.Audio && participant.identity !== room.localParticipant.identity) {
    // This is the agent's audio output
    const audioElement = track.attach();
    document.body.appendChild(audioElement);
  }
});

Receiving transcriptions

The agent publishes transcriptions of its speech. You can use these for captions or logging.
room.on(RoomEvent.TranscriptionReceived, (segments, participant) => {
  for (const segment of segments) {
    console.log(`Agent said: ${segment.text}`);
  }
});

Receiving response metadata

The agent publishes the response_id and conversation_id (if applicable) when it starts generating a response. Save the response_id to chain future sessions using previous_response_id.
room.on(RoomEvent.DataReceived, (payload, participant) => {
  if (participant.identity !== room.localParticipant.identity) {
    const data = JSON.parse(new TextDecoder().decode(payload));

    if (data.response_id) {
      console.log('Response ID:', data.response_id);
      console.log('Conversation ID:', data.conversation_id); // present if using a conversation
      // Save response_id to chain future sessions with previous_response_id
    }
  }
});

Conversation continuity

The agent supports multi-turn conversations with persistent memory. Every response is automatically assigned a response_id, while conversations must be explicitly created and managed through the Conversations API. There are two ways to continue a conversation: llm.previous_response_id chains a new session to a specific response. The agent receives the context from that response and all prior responses in the chain. Use this when:
  • You want to continue from a specific point in a conversation
  • You’re building a linear conversation flow
  • You want to branch from a specific response
llm.conversation references a conversation created via the Conversations API. Use this when:
  • You need to manage conversation metadata (titles, tags, etc.)
  • You want to list or search past conversations
  • You’re building a chat interface with persistent conversation history
  • Multiple clients need to access the same conversation

Starting a new session

import { Room, RoomEvent } from 'livekit-client';
import { SDK } from '@meetkai/mka1';

const mka1 = new SDK({
  bearerAuth: `Bearer ${YOUR_API_KEY}`,
});

// 1. Get a token for a new session
const session = await mka1.llm.speech.livekitToken({
  llm: { model: 'meetkai:qwen3.5-35b-a3b' }
}, { headers: { 'X-On-Behalf-Of': 'user-123' } }); // Optional

// 2. Connect to the room
const room = new Room();
await room.connect(session.url, session.token);

// 3. Track the response ID when the agent responds
let lastResponseId: string;
room.on(RoomEvent.DataReceived, (payload, participant) => {
  const data = JSON.parse(new TextDecoder().decode(payload));
  if (data.response_id) {
    lastResponseId = data.response_id;
  }
});

// 4. Have a conversation...
// 5. Disconnect when done
room.disconnect();

Continuing from a previous response

Use previous_response_id to chain a new session to the last response, preserving conversation context:
// Use the same API key and X-On-Behalf-Of as the original session
const mka1 = new SDK({
  bearerAuth: `Bearer ${YOUR_API_KEY}`,
});

// 1. Get a new token chained to the previous response
const session = await mka1.llm.speech.livekitToken({
  llm: {
    model: 'meetkai:qwen3.5-35b-a3b',
    previousResponseId: lastResponseId
  }
}, { headers: { 'X-On-Behalf-Of': 'user-123' } }); // Optional, but must match if the original session used it

// 2. Connect to the new room
const room = new Room();
await room.connect(session.url, session.token);

// 3. The agent now has context from the previous session
// User: "What did I ask you earlier?"
// Agent: "You asked about the capital of France..."

Continuing from a conversation

Use conversation_id to continue an existing conversation created via the Conversations API:
// Use the same API key and X-On-Behalf-Of as the original session
const mka1 = new SDK({
  bearerAuth: `Bearer ${YOUR_API_KEY}`,
});

// 1. Get a new token with the conversation ID
const session = await mka1.llm.speech.livekitToken({
  llm: {
    model: 'meetkai:qwen3.5-35b-a3b',
    conversation: { id: conversationId }
  }
}, { headers: { 'X-On-Behalf-Of': 'user-123' } }); // Optional, but must match if the original session used it

// 2. Connect to the new room
const room = new Room();
await room.connect(session.url, session.token);

// 3. The agent now has context from the entire conversation history
When continuing a conversation, the API key and X-On-Behalf-Of header (if used) must match the original session. The context is scoped to the authenticated identity.

Handling disconnection

Tokens expire after 5 minutes. If you need longer sessions, implement reconnection logic:
room.on(RoomEvent.Disconnected, async () => {
  console.log('Disconnected from room');

  // Get a new token (continuing from the last response)
  const newSession = await mka1.llm.speech.livekitToken({
    llm: {
      model: 'meetkai:qwen3.5-35b-a3b',
      previousResponseId: savedResponseId
    }
  });

  // Reconnect
  await room.connect(newSession.url, newSession.token);
});

Complete example

Here’s a complete example putting it all together:
import { Room, RoomEvent, Track, createLocalAudioTrack } from 'livekit-client';
import { SDK } from '@meetkai/mka1';

async function startVoiceSession(model: string = 'meetkai:qwen3.5-35b-a3b') {
  const mka1 = new SDK({ bearerAuth: `Bearer ${YOUR_API_KEY}` });

  // Get room credentials
  const session = await mka1.llm.speech.livekitToken({ llm: { model } });

  // Create and connect to room
  const room = new Room();

  let lastResponseId: string | undefined;

  // Handle agent audio output
  room.on(RoomEvent.TrackSubscribed, (track, publication, participant) => {
    if (track.kind === Track.Kind.Audio) {
      const audio = track.attach();
      document.body.appendChild(audio);
    }
  });

  // Handle transcriptions
  room.on(RoomEvent.TranscriptionReceived, (segments) => {
    for (const segment of segments) {
      console.log('Agent:', segment.text);
    }
  });

  // Handle response metadata
  room.on(RoomEvent.DataReceived, (payload, participant) => {
    const data = JSON.parse(new TextDecoder().decode(payload));
    if (data.response_id) {
      lastResponseId = data.response_id;
    }
  });

  // Connect to the room
  await room.connect(session.url, session.token);

  // Capture and publish microphone
  const audioTrack = await createLocalAudioTrack({
    echoCancellation: true,
    noiseSuppression: true
  });
  await room.localParticipant.publishTrack(audioTrack);

  // The agent will greet you automatically
  // Start speaking to interact!

  return { room, getLastResponseId: () => lastResponseId };
}

Error handling

Token endpoint errors

These are returned as HTTP responses when requesting a room token:
ErrorCauseSolution
400 Bad RequestMissing required llm.model parameterInclude model inside the llm object
400 Bad RequestBoth previous_response_id and conversation specifiedUse only one, not both
401 UnauthorizedInvalid or missing API keyCheck your API key is valid

In-session errors

During an active voice session, the agent publishes errors via the LiveKit data channel. Listen for them alongside response metadata:
room.on(RoomEvent.DataReceived, (payload, participant) => {
  if (participant.identity === room.localParticipant.identity) return;

  const data = JSON.parse(new TextDecoder().decode(payload));

  if (data.error) {
    console.error(`[${data.error.service}] ${data.error.code}: ${data.error.message}`);
    // data.error.details may contain additional debugging info
  }

  if (data.response_id) {
    lastResponseId = data.response_id;
  }
});
The error payload structure:
{
  "error": {
    "code": "rate_limited",
    "message": "HTTP 429",
    "service": "llm",
    "details": "..."
  }
}
FieldDescription
codeError code (see table below)
messageShort description of the error
serviceWhich part of the pipeline failed: llm, stt, or tts
detailsAdditional context for debugging (optional)
Error codes:
CodeServiceCause
invalid_sessionMissing required metadata fields (sub, llm)
auth_errorFailed to decrypt credentials from the room token
session_errorAgent failed to start the voice session
invalid_requestllmBad request to the Responses API (HTTP 400)
auth_errorllmInvalid API key (HTTP 401)
access_deniedllmInsufficient permissions (HTTP 403)
rate_limitedllmRate limit exceeded (HTTP 429)
service_errorllmInternal server error (HTTP 500)
service_unavailablellmUpstream unavailable (HTTP 502/503)
timeoutllmRequest timed out (HTTP 504)
connection_errorllmFailed to connect to the Responses API
transcription_errorsttSpeech-to-text processing failed
speech_errorttsText-to-speech synthesis failed

Connection errors

IssueCauseSolution
Connection timeoutNetwork issues or invalid tokenGet a fresh token and retry
Token expiredSession exceeded 5 minutesGet a new token with previous_response_id to continue

Next steps