Build real-time voice sessions with the MKA1 API using LiveKit. Configure LLM options, tools, STT tuning, and conversation continuity.
The MKA1 API provides a real-time voice interface through LiveKit. This guide covers how to obtain a room token, connect to a voice session, send audio and text input, and capture the agent’s responses.
The voice integration consists of three main components:
Room Token: A JWT that grants access to a LiveKit room
LiveKit Connection: WebRTC-based real-time communication
Voice Agent: Processes audio/text input and generates spoken responses
The agent pipeline works as follows:
STT (Speech-to-Text): Audio is streamed via WebSocket at 16kHz and transcribed
LLM: Transcribed text is processed by the MKA1 Responses API
TTS (Text-to-Speech): LLM output is synthesized to audio at 24kHz
Every request the voice agent sends to the Responses API automatically includes "voice_mode": "true" in the request metadata. This lets you distinguish voice-originated responses from text-based ones when reviewing usage or response history.
To start a voice session, first request a room token from the MKA1 API. The token endpoint requires an API key and optionally accepts X-On-Behalf-Of to identify end users. See Authentication for details.
The llm object accepts the same fields as the Responses API request body, minus fields managed by the voice agent (input, stream, store, background).
Field
Required
Description
model
Yes
LLM model to use (e.g., auto)
instructions
No
Custom system instructions for the agent
previous_response_id
No
Chain this session to a specific response from a previous session
conversation
No
Continue an existing conversation — pass { "id": "conv_abc123..." } or the conversation ID as a string
tools
No
Array of tool definitions (function, web_search, file_search, etc.)
tool_choice
No
How the model selects tools ("auto", "none", "required", or a specific tool)
parallel_tool_calls
No
Whether to allow parallel tool execution
max_tool_calls
No
Maximum number of tool calls per response (default: 30)
temperature
No
Sampling temperature (e.g., 0.7)
max_output_tokens
No
Maximum tokens in the response
reasoning
No
Reasoning configuration (e.g., { "effort": "high" }). Set { "effort": "none" } for voice sessions to minimize latency — see note below.
top_p
No
Nucleus sampling parameter
presence_penalty
No
Presence penalty for token repetition
frequency_penalty
No
Frequency penalty for token repetition
truncation
No
"auto" or "disabled" — controls context truncation
context_management
No
Context management strategies for conversation truncation
service_tier
No
"auto", "default", "flex", or "priority"
prompt
No
Reference to a prompt template and its variables
text
No
Text output configuration (format, verbosity)
metadata
No
Key-value metadata passed to the Responses API
You cannot specify both previous_response_id and conversation.
The token metadata is embedded in a JWT, which is passed as an HTTP header. Keep the total llm payload under ~8 KB — large tools arrays may need to be trimmed.
For voice sessions, disable reasoning by setting "reasoning": { "effort": "none" }. Reasoning adds thinking time before the model responds, which increases latency and creates noticeable pauses in conversation. Disabling it keeps responses fast and natural.
const mka1 = new SDK({ bearerAuth: `Bearer ${YOUR_API_KEY}`,});const session = await mka1.llm.speech.livekitToken({ llm: { model: 'auto', previousResponseId: 'resp_abc123...' }}, { headers: { 'X-On-Behalf-Of': 'user-123' } }); // Optional, but must match if the original session used it
To continue an existing conversation:
const mka1 = new SDK({ bearerAuth: `Bearer ${YOUR_API_KEY}`,});const session = await mka1.llm.speech.livekitToken({ llm: { model: 'auto', conversation: { id: 'conv_abc123...' } }}, { headers: { 'X-On-Behalf-Of': 'user-123' } }); // Optional, but must match if the original session used it
When continuing a session, the API key and X-On-Behalf-Of header (if used) must match the original session. The voice agent encrypts both into the room token and passes them to all downstream MKA1 services. If they don’t match, the agent will not have access to the previous context.
Once you have a token, use the LiveKit SDK to connect to the room.
import { Room, RoomEvent, Track } from 'livekit-client';const room = new Room();// Connect to the roomawait room.connect(session.url, session.token);console.log('Connected to room:', room.name);
The agent accepts audio input via the LiveKit room’s audio track. The audio is processed at 16kHz sample rate.
import { createLocalAudioTrack } from 'livekit-client';// Create a local audio track from the microphoneconst audioTrack = await createLocalAudioTrack({ echoCancellation: true, noiseSuppression: true, autoGainControl: true});// Publish the track to the roomawait room.localParticipant.publishTrack(audioTrack);
Voice Activity Detection (VAD): VAD is handled server-side by the MKA1 agent, not locally. The agent automatically detects when you stop speaking and begins processing.
Sample rate: Audio is streamed at 16kHz to the STT service.
Endpointing: The agent uses server-side endpointing to determine when speech ends. There is no local endpointing delay.
You can also send text messages directly to the agent without speaking.
// Send a text message to the agentconst message = JSON.stringify({ type: 'user_message', content: 'What is the capital of France?'});await room.localParticipant.publishData( new TextEncoder().encode(message), { reliable: true, topic: 'lk.chat' });
The agent publishes the response_id and conversation_id (if applicable) when it starts generating a response. Save the response_id to chain future sessions using previous_response_id.
room.on(RoomEvent.DataReceived, (payload, participant) => { if (participant.identity !== room.localParticipant.identity) { const data = JSON.parse(new TextDecoder().decode(payload)); if (data.response_id) { console.log('Response ID:', data.response_id); console.log('Conversation ID:', data.conversation_id); // present if using a conversation // Save response_id to chain future sessions with previous_response_id } }});
The agent supports multi-turn conversations with persistent memory. Every response is automatically assigned a response_id, while conversations must be explicitly created and managed through the Conversations API.There are two ways to continue a conversation:llm.previous_response_id chains a new session to a specific response. The agent receives the context from that response and all prior responses in the chain. Use this when:
You want to continue from a specific point in a conversation
You’re building a linear conversation flow
You want to branch from a specific response
llm.conversation references a conversation created via the Conversations API. Use this when:
You need to manage conversation metadata (titles, tags, etc.)
You want to list or search past conversations
You’re building a chat interface with persistent conversation history
Multiple clients need to access the same conversation
import { Room, RoomEvent } from 'livekit-client';import { SDK } from '@meetkai/mka1';const mka1 = new SDK({ bearerAuth: `Bearer ${YOUR_API_KEY}`,});// 1. Get a token for a new sessionconst session = await mka1.llm.speech.livekitToken({ llm: { model: 'auto' }}, { headers: { 'X-On-Behalf-Of': 'user-123' } }); // Optional// 2. Connect to the roomconst room = new Room();await room.connect(session.url, session.token);// 3. Track the response ID when the agent respondslet lastResponseId: string;room.on(RoomEvent.DataReceived, (payload, participant) => { const data = JSON.parse(new TextDecoder().decode(payload)); if (data.response_id) { lastResponseId = data.response_id; }});// 4. Have a conversation...// 5. Disconnect when doneroom.disconnect();
Use previous_response_id to chain a new session to the last response, preserving conversation context:
// Use the same API key and X-On-Behalf-Of as the original sessionconst mka1 = new SDK({ bearerAuth: `Bearer ${YOUR_API_KEY}`,});// 1. Get a new token chained to the previous responseconst session = await mka1.llm.speech.livekitToken({ llm: { model: 'auto', previousResponseId: lastResponseId }}, { headers: { 'X-On-Behalf-Of': 'user-123' } }); // Optional, but must match if the original session used it// 2. Connect to the new roomconst room = new Room();await room.connect(session.url, session.token);// 3. The agent now has context from the previous session// User: "What did I ask you earlier?"// Agent: "You asked about the capital of France..."
Use conversation_id to continue an existing conversation created via the Conversations API:
// Use the same API key and X-On-Behalf-Of as the original sessionconst mka1 = new SDK({ bearerAuth: `Bearer ${YOUR_API_KEY}`,});// 1. Get a new token with the conversation IDconst session = await mka1.llm.speech.livekitToken({ llm: { model: 'auto', conversation: { id: conversationId } }}, { headers: { 'X-On-Behalf-Of': 'user-123' } }); // Optional, but must match if the original session used it// 2. Connect to the new roomconst room = new Room();await room.connect(session.url, session.token);// 3. The agent now has context from the entire conversation history
When continuing a conversation, the API key and X-On-Behalf-Of header (if used) must match the original session. The context is scoped to the authenticated identity.