from mka1 import SDK
with SDK(
bearer_auth="<YOUR_BEARER_TOKEN_HERE>",
) as sdk:
res = sdk.llm.speech.transcribe(file={
"file_name": "example.file",
"content": open("example.file", "rb"),
}, model="auto", language="en", include_speaker_data=True, prompt="This is a technical podcast about machine learning.", temperature=0.2)
# Handle response
print(res){
"text": "Hello, this is a sample transcription of the audio file.",
"language": "en",
"confidence": 0.95,
"speakers": [
{
"speaker": "Speaker-1",
"text": "Hello, this is a sample transcription of the audio file.",
"confidence": 0.95,
"offset_ms": 0,
"duration_ms": 2100
}
]
}Convert audio to text using advanced speech recognition.
Complete File Upload (Standard)
Use Content-Type: multipart/form-data to upload the complete audio file in one request. Maximum file size: 25MB.
Example:
curl -X POST "http://localhost:3000/api/v1/llm/speech/transcriptions?language=en" \
-F "file=@audio.flac"
Chunked Upload (Streaming)
Use Transfer-Encoding: chunked header to stream audio data in chunks as it’s being recorded. No need to know total file size upfront. Server buffers chunks until complete before processing. Maximum total size: 25MB.
Example:
curl -X POST "http://localhost:3000/api/v1/llm/speech/transcriptions?language=en" \
-H "Transfer-Encoding: chunked" \
-H "Content-Type: multipart/form-data" \
--data-binary @audio.flac
Supported Formats: FLAC, MP3, MP4, MPEG, MPGA, M4A, OGG, WAV, WebM
Query Parameters:
model (optional): Transcription model identifier. Defaults to ‘auto’.language (optional): ISO-639-1 or BCP-47 language code (e.g., “en”, “en-US”). Auto-detects if not specified.prompt (optional): Legacy prompt parameter retained for backward compatibility.temperature (optional): Legacy temperature parameter retained for backward compatibility.include_speaker_data (optional): When true, include speaker diarization data and require WAV/PCM input. Otherwise transcription uses the standard compatibility path.Response: Returns transcribed text in JSON format.
from mka1 import SDK
with SDK(
bearer_auth="<YOUR_BEARER_TOKEN_HERE>",
) as sdk:
res = sdk.llm.speech.transcribe(file={
"file_name": "example.file",
"content": open("example.file", "rb"),
}, model="auto", language="en", include_speaker_data=True, prompt="This is a technical podcast about machine learning.", temperature=0.2)
# Handle response
print(res){
"text": "Hello, this is a sample transcription of the audio file.",
"language": "en",
"confidence": 0.95,
"speakers": [
{
"speaker": "Speaker-1",
"text": "Hello, this is a sample transcription of the audio file.",
"confidence": 0.95,
"offset_ms": 0,
"duration_ms": 2100
}
]
}Gateway auth: send Authorization: Bearer <mka1-api-key>. For multi-user server-side integrations, you can also send X-On-Behalf-Of: <external-user-id>.
Transcription model identifier. Defaults to 'auto' which selects the best available model. Ignored when speaker diarization is requested.
"auto"
The language of the input audio in ISO-639-1 or BCP-47 format (for example 'en' or 'en-US'). If not specified, the transcription service auto-detects the language.
"en"
Whether to include speaker-segment data. Defaults to false. When true, the response includes a speakers array split by detected speaker.
true
Legacy prompt parameter retained for backward compatibility. Used only by the fallback path for non-WAV uploads.
"This is a technical podcast about machine learning."
Legacy temperature parameter retained for backward compatibility. Used only by the fallback path for non-WAV uploads.
0 <= x <= 10.2
Audio file to transcribe.
File Requirements:
Upload Options:
Note: For chunked uploads, server buffers all chunks before processing. Transcription begins only after the complete file is received.
OK
Response from the transcription endpoint containing the transcribed text, detected language, confidence score, and optional speaker segments.
The transcribed text from the audio file
The detected or specified language code in ISO-639-1 format (e.g., 'en', 'es', 'fr')
Confidence score from 0 to 1 where 1 indicates highest confidence in transcription accuracy
Speaker diarization segments, returned only when include_speaker_data is true.
Show child attributes
Was this page helpful?