Skip to main content
POST
/
api
/
v1
/
llm
/
speech
/
transcriptions
Typescript (SDK)
import { SDK } from "@meetkai/mka1";
import { openAsBlob } from "node:fs";

const sdk = new SDK({
  bearerAuth: "<YOUR_BEARER_TOKEN_HERE>",
});

async function run() {
  const result = await sdk.llm.speech.transcribe({
    language: "en",
    prompt: "This is a technical podcast about machine learning.",
    temperature: 0.2,
    requestBody: {
      file: await openAsBlob("example.file"),
    },
  });

  console.log(result);
}

run();
{
  "text": "Hello, this is a sample transcription of the audio file.",
  "language": "en",
  "confidence": 0.95
}

Authorizations

Authorization
string
header
required

Authenticate with your MKA1 API key at the API gateway: Authorization: Bearer <mka1-api-key>. For multi-user server-side integrations, also send X-On-Behalf-Of to identify the end user making the request.

Headers

X-On-Behalf-Of
string

Optional external user identifier for multi-user server-side integrations. Use this when acting on behalf of one of your end users.

Query Parameters

model
enum<string>
default:whisper-large-v3

The Whisper model to use for transcription. 'whisper-large-v3' (default) offers better accuracy, 'whisper-large-v3-turbo' is faster but less accurate for some languages.

Available options:
whisper-large-v3,
whisper-large-v3-turbo
Example:

"whisper-large-v3"

language
string

The language of the input audio in ISO-639-1 format (e.g., 'en', 'es', 'fr'). If not specified, the model will auto-detect the language.

Example:

"en"

prompt
string

An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.

Example:

"This is a technical podcast about machine learning."

temperature
number

The sampling temperature, between 0 and 1. Higher values make the output more random, lower values more focused and deterministic.

Required range: 0 <= x <= 1
Example:

0.2

Body

multipart/form-data
file
file
required

Audio file to transcribe.

File Requirements:

  • Maximum size: 25MB
  • Supported formats: FLAC, MP3, MP4, MPEG, MPGA, M4A, OGG, WAV, WebM
  • Audio is automatically optimized for transcription

Upload Options:

  1. Standard: Upload complete file using multipart/form-data
  2. Chunked: Stream file chunks using Transfer-Encoding: chunked header (useful for real-time recording)

Note: For chunked uploads, server buffers all chunks before processing. Transcription begins only after the complete file is received.

Response

200 - application/json

OK

Response from the transcription endpoint containing the transcribed text, detected language, and confidence score.

text
string
required

The transcribed text from the audio file

language
string

The detected or specified language code in ISO-639-1 format (e.g., 'en', 'es', 'fr')

confidence
number

Confidence score from 0 to 1 where 1 indicates highest confidence in transcription accuracy