Skip to main content
POST
/
api
/
v1
/
llm
/
speech
/
transcriptions
Python (SDK)
from mka1 import SDK


with SDK(
    bearer_auth="<YOUR_BEARER_TOKEN_HERE>",
) as sdk:

    res = sdk.llm.speech.transcribe(file={
        "file_name": "example.file",
        "content": open("example.file", "rb"),
    }, model="auto", language="en", include_speaker_data=True, prompt="This is a technical podcast about machine learning.", temperature=0.2)

    # Handle response
    print(res)
{
  "text": "Hello, this is a sample transcription of the audio file.",
  "language": "en",
  "confidence": 0.95,
  "speakers": [
    {
      "speaker": "Speaker-1",
      "text": "Hello, this is a sample transcription of the audio file.",
      "confidence": 0.95,
      "offset_ms": 0,
      "duration_ms": 2100
    }
  ]
}

Authorizations

Authorization
string
header
required

Gateway auth: send Authorization: Bearer <mka1-api-key>. For multi-user server-side integrations, you can also send X-On-Behalf-Of: <external-user-id>.

Query Parameters

model
string
default:auto

Transcription model identifier. Defaults to 'auto' which selects the best available model. Ignored when speaker diarization is requested.

Example:

"auto"

language
string

The language of the input audio in ISO-639-1 or BCP-47 format (for example 'en' or 'en-US'). If not specified, the transcription service auto-detects the language.

Example:

"en"

include_speaker_data
boolean
default:false

Whether to include speaker-segment data. Defaults to false. When true, the response includes a speakers array split by detected speaker.

Example:

true

prompt
string

Legacy prompt parameter retained for backward compatibility. Used only by the fallback path for non-WAV uploads.

Example:

"This is a technical podcast about machine learning."

temperature
number

Legacy temperature parameter retained for backward compatibility. Used only by the fallback path for non-WAV uploads.

Required range: 0 <= x <= 1
Example:

0.2

Body

multipart/form-data
file
file
required

Audio file to transcribe.

File Requirements:

  • Maximum size: 25MB
  • Supported formats: FLAC, MP3, MP4, MPEG, MPGA, M4A, OGG, WAV, WebM
  • Speaker diarization currently requires WAV / PCM input

Upload Options:

  1. Standard: Upload complete file using multipart/form-data
  2. Chunked: Stream file chunks using Transfer-Encoding: chunked header (useful for real-time recording)

Note: For chunked uploads, server buffers all chunks before processing. Transcription begins only after the complete file is received.

Response

200 - application/json

OK

Response from the transcription endpoint containing the transcribed text, detected language, confidence score, and optional speaker segments.

text
string
required

The transcribed text from the audio file

language
string

The detected or specified language code in ISO-639-1 format (e.g., 'en', 'es', 'fr')

confidence
number

Confidence score from 0 to 1 where 1 indicates highest confidence in transcription accuracy

speakers
object[]

Speaker diarization segments, returned only when include_speaker_data is true.