Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.mka1.com/llms.txt

Use this file to discover all available pages before exploring further.

The Responses API accepts text, images, audio, and files in a single request. Use structured input with content arrays to combine modalities.

Supported input types

TypeContent typeFormatsDelivery
Textinput_textPlain textInline
Imageinput_imageJPEG, PNG, WebP, GIF, TIFFURL, base64 data URI, or file_id
Audioinput_audioWAV, MP3Base64
Documentinput_filePDF, DOCX, XLSX, PPTX, RTF, TXT, CSVURL, base64 data URI, or file_id
Videoinput_fileMP4Base64 data URI or file_id

Image input

Send an image for the model to describe, analyze, or answer questions about. Provide the image as a URL, a base64 data URI, or a previously uploaded file_id.

Image via URL

mka1 llm responses create \
  -H 'X-On-Behalf-Of: <end-user-id>' \
  --body '{
    "model": "auto",
    "input": [
      {
        "type": "message",
        "role": "user",
        "content": [
          { "type": "input_text", "text": "Describe what you see in this image." },
          {
            "type": "input_image",
            "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/3/3a/Cat03.jpg/1200px-Cat03.jpg"
          }
        ]
      }
    ]
  }'

Image via base64

Encode the image as a data URI with the appropriate MIME type.
IMAGE_B64=$(base64 -i photo.jpg)

mka1 llm responses create \
  --body "{
    \"model\": \"auto\",
    \"input\": [
      {
        \"type\": \"message\",
        \"role\": \"user\",
        \"content\": [
          { \"type\": \"input_text\", \"text\": \"What is in this photo?\" },
          {
            \"type\": \"input_image\",
            \"image_url\": \"data:image/jpeg;base64,${IMAGE_B64}\"
          }
        ]
      }
    ]
  }"

Image via file_id

Upload an image with the Files API first, then reference it by ID.
# Upload the image
FILE_ID=$(mka1 llm files upload \
  --file @photo.jpg \
  --purpose assistants | jq -r '.id')

# Use the file_id
mka1 llm responses create \
  --body "{
    \"model\": \"auto\",
    \"input\": [
      {
        \"type\": \"message\",
        \"role\": \"user\",
        \"content\": [
          { \"type\": \"input_text\", \"text\": \"Describe this image.\" },
          { \"type\": \"input_image\", \"file_id\": \"${FILE_ID}\" }
        ]
      }
    ]
  }"

Audio input

Send audio for the model to process. The audio is automatically transcribed and the model responds to the spoken content. Supported formats: WAV and MP3 (max 25 MB).
AUDIO_B64=$(base64 -i recording.wav)

mka1 llm responses create \
  --body "{
    \"model\": \"auto\",
    \"input\": [
      {
        \"type\": \"message\",
        \"role\": \"user\",
        \"content\": [
          {
            \"type\": \"input_audio\",
            \"input_audio\": {
              \"data\": \"${AUDIO_B64}\",
              \"format\": \"wav\"
            }
          }
        ]
      }
    ]
  }"
The model automatically transcribes the audio and responds to the spoken content. For example, sending a WAV file containing “Hello, how are you today?” returns:
{
  "status": "completed",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "Hello! I'm doing well, thank you for asking. I'm here and ready to help you with any questions or tasks you might have. How can I assist you today?"
        }
      ]
    }
  ]
}

Document input

Send documents for the model to read and reason over. PDF and scanned documents are automatically processed with OCR — no extra configuration needed.

Document via URL

mka1 llm responses create \
  --body '{
    "model": "auto",
    "input": [
      {
        "type": "message",
        "role": "user",
        "content": [
          { "type": "input_text", "text": "Summarize this document in three bullet points." },
          {
            "type": "input_file",
            "file_url": "https://example.com/report.pdf",
            "filename": "report.pdf"
          }
        ]
      }
    ]
  }'

Document via base64

Encode the file as a data URI. Include the MIME type so the API can route it to the correct processor.
PDF_B64=$(base64 -i contract.pdf)

mka1 llm responses create \
  --body "{
    \"model\": \"auto\",
    \"input\": [
      {
        \"type\": \"message\",
        \"role\": \"user\",
        \"content\": [
          { \"type\": \"input_text\", \"text\": \"What are the key terms in this contract?\" },
          {
            \"type\": \"input_file\",
            \"file_data\": \"data:application/pdf;base64,${PDF_B64}\",
            \"filename\": \"contract.pdf\"
          }
        ]
      }
    ]
  }"

Scanned documents and OCR

Scanned PDFs and images of documents are processed automatically. The API uses OCR to extract text from:
  • Scanned PDF pages (converted to images at 150 DPI, then OCR’d)
  • Photos of documents (JPEG, PNG, TIFF)
  • Office files (DOCX, XLSX, PPTX — converted to PDF first, then OCR’d)
Multi-page documents are processed in parallel. The extracted text is returned as Markdown and passed to the model for reasoning. No special parameters are needed — just send the file as input_file and the pipeline handles detection, conversion, and OCR.

Supported document formats

FormatMIME typeProcessing
PDFapplication/pdfOCR per page at 150 DPI
JPEG / PNG / TIFF / WebP / GIFimage/*Direct OCR
Word (.doc, .docx)application/msword, application/vnd.openxmlformats-officedocument.wordprocessingml.documentConvert to PDF, then OCR
Excel (.xls, .xlsx)application/vnd.ms-excel, application/vnd.openxmlformats-officedocument.spreadsheetml.sheetConvert to PDF, then OCR
PowerPoint (.ppt, .pptx)application/vnd.ms-powerpoint, application/vnd.openxmlformats-officedocument.presentationml.presentationConvert to PDF, then OCR
RTFapplication/rtfConvert to PDF, then OCR
Plain text / CSVtext/plain, text/csvRead directly
Size limit: 30 MB per file.

Mixed input

Combine multiple content types in a single message. The model sees all inputs together and can reason across them.
mka1 llm responses create \
  --body '{
    "model": "auto",
    "input": [
      {
        "type": "message",
        "role": "user",
        "content": [
          { "type": "input_text", "text": "Compare the chart in the image with the data in the spreadsheet. Are the numbers consistent?" },
          {
            "type": "input_image",
            "image_url": "https://example.com/chart.png"
          },
          {
            "type": "input_file",
            "file_url": "https://example.com/data.xlsx",
            "filename": "data.xlsx"
          }
        ]
      }
    ]
  }'

Next steps