Convert audio to text using advanced speech recognition.
Complete File Upload (Standard)
Use Content-Type: multipart/form-data to upload the complete audio file in one request. Maximum file size: 25MB.
Example:
curl -X POST "http://localhost:3000/api/v1/llm/speech/transcriptions?language=en" \
-F "file=@audio.flac"
Chunked Upload (Streaming)
Use Transfer-Encoding: chunked header to stream audio data in chunks as it’s being recorded. No need to know total file size upfront. Server buffers chunks until complete before processing. Maximum total size: 25MB.
Example:
curl -X POST "http://localhost:3000/api/v1/llm/speech/transcriptions?language=en" \
-H "Transfer-Encoding: chunked" \
-H "Content-Type: multipart/form-data" \
--data-binary @audio.flac
Supported Formats: FLAC, MP3, MP4, MPEG, MPGA, M4A, OGG, WAV, WebM
Query Parameters:
language (optional): ISO-639-1 language code (e.g., “en”, “es”, “fr”). Auto-detects if not specified.prompt (optional): Text to guide transcription styletemperature (optional): Sampling temperature 0-1 (higher = more random)Response: Returns transcribed text in JSON format.
Authenticate with your MKA1 API key at the API gateway: Authorization: Bearer <mka1-api-key>. For multi-user server-side integrations, also send X-On-Behalf-Of to identify the end user making the request.
Optional external user identifier for multi-user server-side integrations. Use this when acting on behalf of one of your end users.
The Whisper model to use for transcription. 'whisper-large-v3' (default) offers better accuracy, 'whisper-large-v3-turbo' is faster but less accurate for some languages.
whisper-large-v3, whisper-large-v3-turbo "whisper-large-v3"
The language of the input audio in ISO-639-1 format (e.g., 'en', 'es', 'fr'). If not specified, the model will auto-detect the language.
"en"
An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.
"This is a technical podcast about machine learning."
The sampling temperature, between 0 and 1. Higher values make the output more random, lower values more focused and deterministic.
0 <= x <= 10.2
Audio file to transcribe.
File Requirements:
Upload Options:
Note: For chunked uploads, server buffers all chunks before processing. Transcription begins only after the complete file is received.
OK
Response from the transcription endpoint containing the transcribed text, detected language, and confidence score.