The MKA1 API provides a real-time voice interface through LiveKit. This guide covers how to obtain a room token, connect to a voice session, send audio and text input, and capture the agent’s responses.
Overview
The voice integration consists of three main components:
- Room Token: A JWT that grants access to a LiveKit room
- LiveKit Connection: WebRTC-based real-time communication
- Voice Agent: Processes audio/text input and generates spoken responses
The agent pipeline works as follows:
- STT (Speech-to-Text): Audio is streamed via WebSocket at 16kHz and transcribed
- LLM: Transcribed text is processed by the MKA1 Responses API
- TTS (Text-to-Speech): LLM output is synthesized to audio at 24kHz
Every request the voice agent sends to the Responses API automatically includes "voice_mode": "true" in the request metadata. This lets you distinguish voice-originated responses from text-based ones when reviewing usage or response history.
Getting a room token
To start a voice session, first request a room token from the MKA1 API. The token endpoint requires an API key and optionally accepts X-On-Behalf-Of to identify end users. See Authentication for details.
import { SDK } from '@meetkai/mka1';
const mka1 = new SDK({
bearerAuth: `Bearer ${YOUR_API_KEY}`,
});
const session = await mka1.llm.speech.livekitToken({
llm: {
model: 'meetkai:qwen3.5-35b-a3b',
reasoning: { effort: 'none' }
}
}, { headers: { 'X-On-Behalf-Of': 'user-123' } }); // Optional
console.log(session.token); // JWT token
console.log(session.url); // WebSocket URL
console.log(session.roomName); // Room name
Parameters
The request body has two top-level objects:
{
"llm": { ... }, // Required — LLM configuration
"stt": { ... } // Optional — speech-to-text tuning
}
llm — LLM configuration (required)
The llm object accepts the same fields as the Responses API request body, minus fields managed by the voice agent (input, stream, store, background).
| Field | Required | Description |
|---|
model | Yes | LLM model to use (e.g., meetkai:qwen3.5-35b-a3b) |
instructions | No | Custom system instructions for the agent |
previous_response_id | No | Chain this session to a specific response from a previous session |
conversation | No | Continue an existing conversation — pass { "id": "conv_abc123..." } or the conversation ID as a string |
tools | No | Array of tool definitions (function, web_search, file_search, etc.) |
tool_choice | No | How the model selects tools ("auto", "none", "required", or a specific tool) |
parallel_tool_calls | No | Whether to allow parallel tool execution |
max_tool_calls | No | Maximum number of tool calls per response (default: 30) |
temperature | No | Sampling temperature (e.g., 0.7) |
max_output_tokens | No | Maximum tokens in the response |
reasoning | No | Reasoning configuration (e.g., { "effort": "high" }). Set { "effort": "none" } for voice sessions to minimize latency — see note below. |
top_p | No | Nucleus sampling parameter |
presence_penalty | No | Presence penalty for token repetition |
frequency_penalty | No | Frequency penalty for token repetition |
truncation | No | "auto" or "disabled" — controls context truncation |
context_management | No | Context management strategies for conversation truncation |
service_tier | No | "auto", "default", "flex", or "priority" |
prompt | No | Reference to a prompt template and its variables |
text | No | Text output configuration (format, verbosity) |
metadata | No | Key-value metadata passed to the Responses API |
You cannot specify both previous_response_id and conversation.
The token metadata is embedded in a JWT, which is passed as an HTTP header. Keep the total llm payload under ~8 KB — large tools arrays may need to be trimmed.
For voice sessions, disable reasoning by setting "reasoning": { "effort": "none" }. Reasoning adds thinking time before the model responds, which increases latency and creates noticeable pauses in conversation. Disabling it keeps responses fast and natural.
stt — Speech-to-text configuration (optional)
Controls server-side voice activity detection (VAD) and endpointing behavior.
| Field | Required | Description |
|---|
silence_timeout_ms | No | Milliseconds of silence before finalizing speech (100–5000) |
initial_silence_timeout_ms | No | Timeout before any speech is detected (1000–30000) |
Advanced configuration
You can pass tools, custom instructions, and STT tuning in a single token request:
const session = await mka1.llm.speech.livekitToken({
llm: {
model: 'meetkai:qwen3.5-35b-a3b',
instructions: 'You are a helpful travel assistant. Be concise in voice responses.',
temperature: 0.7,
tools: [
{
type: 'web_search',
userLocation: { country: 'US' }
},
{
type: 'function',
name: 'book_flight',
description: 'Book a flight for the user',
parameters: {
type: 'object',
properties: {
origin: { type: 'string' },
destination: { type: 'string' },
date: { type: 'string' }
},
required: ['origin', 'destination', 'date']
}
}
],
toolChoice: 'auto'
},
stt: {
silenceTimeoutMs: 500,
initialSilenceTimeoutMs: 10000
}
});
Response
{
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
"url": "wss://apigw.mka1.com/api/v1/livekit",
"roomName": "550e8400-e29b-41d4-a716-446655440000"
}
| Field | Description |
|---|
token | JWT access token (5-minute TTL) with room join, publish, and subscribe permissions |
url | LiveKit WebSocket URL to connect to |
roomName | Auto-generated UUID for this session |
The token includes metadata that the voice agent uses to configure the session.
Continuing a session
To continue from a previous response:
const mka1 = new SDK({
bearerAuth: `Bearer ${YOUR_API_KEY}`,
});
const session = await mka1.llm.speech.livekitToken({
llm: {
model: 'meetkai:qwen3.5-35b-a3b',
previousResponseId: 'resp_abc123...'
}
}, { headers: { 'X-On-Behalf-Of': 'user-123' } }); // Optional, but must match if the original session used it
To continue an existing conversation:
const mka1 = new SDK({
bearerAuth: `Bearer ${YOUR_API_KEY}`,
});
const session = await mka1.llm.speech.livekitToken({
llm: {
model: 'meetkai:qwen3.5-35b-a3b',
conversation: { id: 'conv_abc123...' }
}
}, { headers: { 'X-On-Behalf-Of': 'user-123' } }); // Optional, but must match if the original session used it
When continuing a session, the API key and X-On-Behalf-Of header (if used) must match the original session. The voice agent encrypts both into the room token and passes them to all downstream MKA1 services. If they don’t match, the agent will not have access to the previous context.
Connecting to a room
Once you have a token, use the LiveKit SDK to connect to the room.
import { Room, RoomEvent, Track } from 'livekit-client';
const room = new Room();
// Connect to the room
await room.connect(session.url, session.token);
console.log('Connected to room:', room.name);
The agent accepts audio input via the LiveKit room’s audio track. The audio is processed at 16kHz sample rate.
import { createLocalAudioTrack } from 'livekit-client';
// Create a local audio track from the microphone
const audioTrack = await createLocalAudioTrack({
echoCancellation: true,
noiseSuppression: true,
autoGainControl: true
});
// Publish the track to the room
await room.localParticipant.publishTrack(audioTrack);
Audio behavior
- Voice Activity Detection (VAD): VAD is handled server-side by the MKA1 agent, not locally. The agent automatically detects when you stop speaking and begins processing.
- Sample rate: Audio is streamed at 16kHz to the STT service.
- Endpointing: The agent uses server-side endpointing to determine when speech ends. There is no local endpointing delay.
Sending text input
You can also send text messages directly to the agent without speaking.
// Send a text message to the agent
const message = JSON.stringify({
type: 'user_message',
content: 'What is the capital of France?'
});
await room.localParticipant.publishData(
new TextEncoder().encode(message),
{ reliable: true, topic: 'lk.chat' }
);
Receiving agent responses
The agent responds in three ways:
- Audio output: Synthesized speech via an audio track
- Transcription: Text of what the agent is saying (for captions)
- Response metadata: Response ID and conversation ID via data channel
Subscribing to audio output
import { RoomEvent, Track } from 'livekit-client';
room.on(RoomEvent.TrackSubscribed, (track, publication, participant) => {
if (track.kind === Track.Kind.Audio && participant.identity !== room.localParticipant.identity) {
// This is the agent's audio output
const audioElement = track.attach();
document.body.appendChild(audioElement);
}
});
Receiving transcriptions
The agent publishes transcriptions of its speech. You can use these for captions or logging.
room.on(RoomEvent.TranscriptionReceived, (segments, participant) => {
for (const segment of segments) {
console.log(`Agent said: ${segment.text}`);
}
});
The agent publishes the response_id and conversation_id (if applicable) when it starts generating a response. Save the response_id to chain future sessions using previous_response_id.
room.on(RoomEvent.DataReceived, (payload, participant) => {
if (participant.identity !== room.localParticipant.identity) {
const data = JSON.parse(new TextDecoder().decode(payload));
if (data.response_id) {
console.log('Response ID:', data.response_id);
console.log('Conversation ID:', data.conversation_id); // present if using a conversation
// Save response_id to chain future sessions with previous_response_id
}
}
});
Conversation continuity
The agent supports multi-turn conversations with persistent memory. Every response is automatically assigned a response_id, while conversations must be explicitly created and managed through the Conversations API.
There are two ways to continue a conversation:
llm.previous_response_id chains a new session to a specific response. The agent receives the context from that response and all prior responses in the chain. Use this when:
- You want to continue from a specific point in a conversation
- You’re building a linear conversation flow
- You want to branch from a specific response
llm.conversation references a conversation created via the Conversations API. Use this when:
- You need to manage conversation metadata (titles, tags, etc.)
- You want to list or search past conversations
- You’re building a chat interface with persistent conversation history
- Multiple clients need to access the same conversation
Starting a new session
import { Room, RoomEvent } from 'livekit-client';
import { SDK } from '@meetkai/mka1';
const mka1 = new SDK({
bearerAuth: `Bearer ${YOUR_API_KEY}`,
});
// 1. Get a token for a new session
const session = await mka1.llm.speech.livekitToken({
llm: { model: 'meetkai:qwen3.5-35b-a3b' }
}, { headers: { 'X-On-Behalf-Of': 'user-123' } }); // Optional
// 2. Connect to the room
const room = new Room();
await room.connect(session.url, session.token);
// 3. Track the response ID when the agent responds
let lastResponseId: string;
room.on(RoomEvent.DataReceived, (payload, participant) => {
const data = JSON.parse(new TextDecoder().decode(payload));
if (data.response_id) {
lastResponseId = data.response_id;
}
});
// 4. Have a conversation...
// 5. Disconnect when done
room.disconnect();
Continuing from a previous response
Use previous_response_id to chain a new session to the last response, preserving conversation context:
// Use the same API key and X-On-Behalf-Of as the original session
const mka1 = new SDK({
bearerAuth: `Bearer ${YOUR_API_KEY}`,
});
// 1. Get a new token chained to the previous response
const session = await mka1.llm.speech.livekitToken({
llm: {
model: 'meetkai:qwen3.5-35b-a3b',
previousResponseId: lastResponseId
}
}, { headers: { 'X-On-Behalf-Of': 'user-123' } }); // Optional, but must match if the original session used it
// 2. Connect to the new room
const room = new Room();
await room.connect(session.url, session.token);
// 3. The agent now has context from the previous session
// User: "What did I ask you earlier?"
// Agent: "You asked about the capital of France..."
Continuing from a conversation
Use conversation_id to continue an existing conversation created via the Conversations API:
// Use the same API key and X-On-Behalf-Of as the original session
const mka1 = new SDK({
bearerAuth: `Bearer ${YOUR_API_KEY}`,
});
// 1. Get a new token with the conversation ID
const session = await mka1.llm.speech.livekitToken({
llm: {
model: 'meetkai:qwen3.5-35b-a3b',
conversation: { id: conversationId }
}
}, { headers: { 'X-On-Behalf-Of': 'user-123' } }); // Optional, but must match if the original session used it
// 2. Connect to the new room
const room = new Room();
await room.connect(session.url, session.token);
// 3. The agent now has context from the entire conversation history
When continuing a conversation, the API key and X-On-Behalf-Of header (if used) must match the original session. The context is scoped to the authenticated identity.
Handling disconnection
Tokens expire after 5 minutes. If you need longer sessions, implement reconnection logic:
room.on(RoomEvent.Disconnected, async () => {
console.log('Disconnected from room');
// Get a new token (continuing from the last response)
const newSession = await mka1.llm.speech.livekitToken({
llm: {
model: 'meetkai:qwen3.5-35b-a3b',
previousResponseId: savedResponseId
}
});
// Reconnect
await room.connect(newSession.url, newSession.token);
});
Complete example
Here’s a complete example putting it all together:
import { Room, RoomEvent, Track, createLocalAudioTrack } from 'livekit-client';
import { SDK } from '@meetkai/mka1';
async function startVoiceSession(model: string = 'meetkai:qwen3.5-35b-a3b') {
const mka1 = new SDK({ bearerAuth: `Bearer ${YOUR_API_KEY}` });
// Get room credentials
const session = await mka1.llm.speech.livekitToken({ llm: { model } });
// Create and connect to room
const room = new Room();
let lastResponseId: string | undefined;
// Handle agent audio output
room.on(RoomEvent.TrackSubscribed, (track, publication, participant) => {
if (track.kind === Track.Kind.Audio) {
const audio = track.attach();
document.body.appendChild(audio);
}
});
// Handle transcriptions
room.on(RoomEvent.TranscriptionReceived, (segments) => {
for (const segment of segments) {
console.log('Agent:', segment.text);
}
});
// Handle response metadata
room.on(RoomEvent.DataReceived, (payload, participant) => {
const data = JSON.parse(new TextDecoder().decode(payload));
if (data.response_id) {
lastResponseId = data.response_id;
}
});
// Connect to the room
await room.connect(session.url, session.token);
// Capture and publish microphone
const audioTrack = await createLocalAudioTrack({
echoCancellation: true,
noiseSuppression: true
});
await room.localParticipant.publishTrack(audioTrack);
// The agent will greet you automatically
// Start speaking to interact!
return { room, getLastResponseId: () => lastResponseId };
}
Error handling
Token endpoint errors
These are returned as HTTP responses when requesting a room token:
| Error | Cause | Solution |
|---|
| 400 Bad Request | Missing required llm.model parameter | Include model inside the llm object |
| 400 Bad Request | Both previous_response_id and conversation specified | Use only one, not both |
| 401 Unauthorized | Invalid or missing API key | Check your API key is valid |
In-session errors
During an active voice session, the agent publishes errors via the LiveKit data channel. Listen for them alongside response metadata:
room.on(RoomEvent.DataReceived, (payload, participant) => {
if (participant.identity === room.localParticipant.identity) return;
const data = JSON.parse(new TextDecoder().decode(payload));
if (data.error) {
console.error(`[${data.error.service}] ${data.error.code}: ${data.error.message}`);
// data.error.details may contain additional debugging info
}
if (data.response_id) {
lastResponseId = data.response_id;
}
});
The error payload structure:
{
"error": {
"code": "rate_limited",
"message": "HTTP 429",
"service": "llm",
"details": "..."
}
}
| Field | Description |
|---|
code | Error code (see table below) |
message | Short description of the error |
service | Which part of the pipeline failed: llm, stt, or tts |
details | Additional context for debugging (optional) |
Error codes:
| Code | Service | Cause |
|---|
invalid_session | — | Missing required metadata fields (sub, llm) |
auth_error | — | Failed to decrypt credentials from the room token |
session_error | — | Agent failed to start the voice session |
invalid_request | llm | Bad request to the Responses API (HTTP 400) |
auth_error | llm | Invalid API key (HTTP 401) |
access_denied | llm | Insufficient permissions (HTTP 403) |
rate_limited | llm | Rate limit exceeded (HTTP 429) |
service_error | llm | Internal server error (HTTP 500) |
service_unavailable | llm | Upstream unavailable (HTTP 502/503) |
timeout | llm | Request timed out (HTTP 504) |
connection_error | llm | Failed to connect to the Responses API |
transcription_error | stt | Speech-to-text processing failed |
speech_error | tts | Text-to-speech synthesis failed |
Connection errors
| Issue | Cause | Solution |
|---|
| Connection timeout | Network issues or invalid token | Get a fresh token and retry |
| Token expired | Session exceeded 5 minutes | Get a new token with previous_response_id to continue |
Next steps