Use the MKA1 API speech endpoints when you need file-based speech-to-text or text-to-speech. For real-time, bidirectional voice sessions, use Advanced voice mode.Documentation Index
Fetch the complete documentation index at: https://docs.mka1.com/llms.txt
Use this file to discover all available pages before exploring further.
Choose the right endpoint
| Use case | Endpoint | Notes |
|---|---|---|
| Transcribe a recorded file | Speech-to-text transcription | Upload audio with multipart/form-data |
| Generate a WAV file from text | Text-to-speech | Best for complete file generation |
| Start playback as soon as audio arrives | Streaming text-to-speech | Best for low-latency playback |
Transcribe audio
Send an audio file to the transcription endpoint when you want text output from a recorded file. If your app acts on behalf of an end user, also sendX-On-Behalf-Of.
Supported audio formats: FLAC, MP3, MP4, MPEG, MPGA, M4A, OGG, WAV, WebM, PCM.
Separate speakers in one transcript
If you need diarization, enable speaker data in the transcription request. When enabled, the response can include aspeakers array with speaker-labeled segments and timing metadata.
text field when you need a single merged transcript.
Use speakers when you need captions, turn-taking, or downstream speaker analytics.
Generate speech
Use the standard text-to-speech endpoint when you want a complete WAV file. The response body is binary audio, and the response headers includeX-Language-Code.
Stream speech for lower latency
Use streaming text-to-speech when you want playback to start before the full audio file is ready. Choosemp3 for smaller payloads or pcm for uncompressed audio.
Next steps
- Review the Speech-to-text transcription reference for request and response details
- Review the Text-to-speech reference for WAV generation
- Review the Streaming text-to-speech reference for low-latency output
- Use Advanced voice mode for real-time conversations