Extract structured data

Use the Extract resource when you want the MKA1 API to turn an uploaded document or image into structured JSON. You can define the schema inline for a one-off request, or save a reusable schema template first.

Create a reusable schema

Save a schema template when you want to run the same extraction shape many times.

mka1 llm extract create-schema \
  -H 'X-On-Behalf-Of: <end-user-id>' \
  --body '{
    "name": "Invoice extraction",
    "description": "Extract invoice header fields from PDF invoices",
    "schema": {
      "type": "object",
      "properties": {
        "invoice_number": { "type": "string" },
        "vendor_name": { "type": "string" },
        "total_amount": { "type": "number" },
        "date": { "type": "string", "format": "date" }
      },
      "required": ["invoice_number", "total_amount"]
    },
    "metadata": { "document_type": "invoice" }
  }'

import { SDK } from '@meetkai/mka1';

const mka1 = new SDK({ bearerAuth: 'Bearer <mka1-api-key>' });

const schema = await mka1.llm.extract.createSchema({
  xOnBehalfOf: '<end-user-id>', // optional — attribute the request to one of your end users
  extractionSchema: {
    name: 'Invoice extraction',
    description: 'Extract invoice header fields from PDF invoices',
    schema: {
      type: 'object',
      properties: {
        invoice_number: { type: 'string' },
        vendor_name: { type: 'string' },
        total_amount: { type: 'number' },
        date: { type: 'string', format: 'date' },
      },
      required: ['invoice_number', 'total_amount'],
    },
    metadata: {
      document_type: 'invoice',
    },
  },
});

console.log(schema.data.id); // e.g. "schema_invoice_123"

using MeetKai.MKA1;
using MeetKai.MKA1.Types.Components;

var sdk = new SDK(
    bearerAuth: $"Bearer {YOUR_API_KEY}",
    serverUrl: "https://apigw.mka1.com"
);

// The Extract API is available at sdk.Llm.Extract
// Use sdk.Llm.Extract.CreateSchema(...) to save a reusable schema
// Refer to the API reference for the full method signature and types

from mka1 import SDK

sdk = SDK(bearer_auth="Bearer YOUR_API_KEY")

schema = sdk.llm.extract.create_schema(
    name="Invoice extraction",
    description="Extract invoice header fields from PDF invoices",
    schema={
        "type": "object",
        "properties": {
            "invoice_number": {"type": "string"},
            "vendor_name": {"type": "string"},
            "total_amount": {"type": "number"},
            "date": {"type": "string", "format": "date"},
        },
        "required": ["invoice_number", "total_amount"],
    },
)

print(schema.data.id)  # e.g. "schema_invoice_123"

curl https://apigw.mka1.com/api/v1/llm/extract/schema \
  --request POST \
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer <mka1-api-key>' \
  --header 'X-On-Behalf-Of: <end-user-id>' \
  --data '{
    "name": "Invoice extraction",
    "description": "Extract invoice header fields from PDF invoices",
    "schema": {
      "type": "object",
      "properties": {
        "invoice_number": {
          "type": "string"
        },
        "vendor_name": {
          "type": "string"
        },
        "total_amount": {
          "type": "number"
        },
        "date": {
          "type": "string",
          "format": "date"
        }
      },
      "required": [
        "invoice_number",
        "total_amount"
      ]
    },
    "metadata": {
      "document_type": "invoice"
    }
  }'

This returns a schema ID such as schema_invoice_123.

Extract with the saved schema

Use the saved schema ID to extract data from a file. The OpenAPI spec supports multipart/form-data for this request.

mka1 llm extract extract-with-schema \
  --schema-id schema_invoice_123 \
  --model auto \
  --prompt 'Extract the structured invoice fields.' \
  --file ./invoice.pdf

const file = Bun.file('./invoice.pdf');

const result = await mka1.llm.extract.extractWithSchema({
  schemaId: 'schema_invoice_123',
  xOnBehalfOf: '<end-user-id>',
  requestBody: {
    model: 'auto',
    prompt: 'Extract the structured invoice fields.',
    file,
  },
});

console.log(result.data);

using MeetKai.MKA1;
using MeetKai.MKA1.Types.Requests;

var sdk = new SDK(bearerAuth: "Bearer YOUR_API_KEY");

var fileContent = await File.ReadAllBytesAsync("./invoice.pdf");

var result = await sdk.Llm.Extract.ExtractWithSchemaAsync(
    "schema_invoice_123",
    new ExtractWithSchemaRequestBody()
    {
        Model = "auto",
        Prompt = "Extract the structured invoice fields.",
        File = new ExtractWithSchemaFile()
        {
            FileName = "invoice.pdf",
            Content = fileContent,
        },
    }
);

Console.WriteLine(result.ExtractionResponse?.Data);

result = sdk.llm.extract.extract_with_schema(
    schema_id="schema_invoice_123",
    model="auto",
    file={
        "file_name": "invoice.pdf",
        "content": open("./invoice.pdf", "rb"),
    },
)

print(result.data)

curl https://apigw.mka1.com/api/v1/llm/extract/schema/schema_invoice_123 \
  --request POST \
  --header 'Authorization: Bearer <mka1-api-key>' \
  --header 'X-On-Behalf-Of: <end-user-id>' \
  --form 'model=auto' \
  --form 'prompt=Extract the structured invoice fields.' \
  --form 'file=@./invoice.pdf'

A successful response returns a JSON object with:

success
data containing the extracted fields
metadata about the extraction run

Use an inline schema for one-off work

If you do not need to reuse the schema, call POST /api/v1/llm/extract instead. Provide your JSON Schema inline along with the extraction request fields. Note: this endpoint uses multipart/form-data. In multipart/form-data, schema (and metadata, if provided) can be sent as JSON strings; some clients/SDKs may also send these as JSON objects.

mka1 llm extract extract \
  --model auto \
  --prompt 'Extract the invoice number and total.' \
  --schema '{
    "type": "object",
    "properties": {
      "invoice_number": { "type": "string" },
      "total_amount": { "type": "number" }
    },
    "required": ["invoice_number", "total_amount"]
  }' \
  --file ./invoice.pdf

const file = Bun.file('./invoice.pdf');

const result = await mka1.llm.extract.extract({
  xOnBehalfOf: '<end-user-id>',
  requestBody: {
    model: 'auto',
    prompt: 'Extract the invoice number and total.',
    // In multipart/form-data, many clients send `schema` as a JSON string.
    // (Some SDKs/clients may also support passing an object.)
    schema: JSON.stringify({
      type: 'object',
      properties: {
        invoice_number: { type: 'string' },
        total_amount: { type: 'number' },
      },
      required: ['invoice_number', 'total_amount'],
    }),
    // In multipart/form-data, `metadata` is commonly sent as a JSON string.
    // metadata: JSON.stringify({ source: 'invoices' }),
    file,
  },
});

console.log(result.data);

using MeetKai.MKA1;
using MeetKai.MKA1.Types.Requests;

var sdk = new SDK(bearerAuth: "Bearer YOUR_API_KEY");

var fileContent = await File.ReadAllBytesAsync("./invoice.pdf");

var result = await sdk.Llm.Extract.ExtractAsync(new ExtractRequestBody()
{
    Model = "auto",
    Prompt = "Extract the invoice number and total.",
    // In multipart/form-data, `Schema` is sent as a JSON string.
    Schema = Newtonsoft.Json.JsonConvert.SerializeObject(new Dictionary<string, object>()
    {
        { "type", "object" },
        { "properties", new Dictionary<string, object>()
            {
                { "invoice_number", new Dictionary<string, object>() { { "type", "string" } } },
                { "total_amount", new Dictionary<string, object>() { { "type", "number" } } },
            }
        },
        { "required", new List<object>() { "invoice_number", "total_amount" } },
    }),
    // In multipart/form-data, `Metadata` is also sent as a JSON string.
    // Metadata = Newtonsoft.Json.JsonConvert.SerializeObject(new Dictionary<string, object>()
    // {
    //     { "source", "invoices" },
    // }),
    File = new ExtractFile()
    {
        FileName = "invoice.pdf",
        Content = fileContent,
    },
});

Console.WriteLine(result.ExtractionResponse?.Data);

import json

result = sdk.llm.extract.extract(
    model="auto",
    # In multipart/form-data, `schema` is sent as a JSON string.
    schema=json.dumps({
        "type": "object",
        "properties": {
            "invoice_number": {"type": "string"},
            "total_amount": {"type": "number"},
        },
        "required": ["invoice_number", "total_amount"],
    }),
    # In multipart/form-data, `metadata` is also sent as a JSON string.
    # metadata=json.dumps({"source": "invoices"}),
    file={
        "file_name": "invoice.pdf",
        "content": open("./invoice.pdf", "rb"),
    },
)

print(result.data)

curl https://apigw.mka1.com/api/v1/llm/extract \
  --request POST \
  --header 'Authorization: Bearer <mka1-api-key>' \
  --header 'X-On-Behalf-Of: <end-user-id>' \
  --form 'model=auto' \
  --form 'prompt=Extract the invoice number and total.' \
  --form 'schema={"type":"object","properties":{"invoice_number":{"type":"string"},"total_amount":{"type":"number"}},"required":["invoice_number","total_amount"]}' \
  --form 'file=@./invoice.pdf'

Use the reusable schema flow when you want a stable contract across many files.

Manage saved schemas

The Extract resource also lets you:

Get a saved schema by ID.
Update a saved schema.
Delete a saved schema.

Use those endpoints when your extraction contract changes over time and you need to keep the schema in sync.

Getting started

Responses

Features

CLI

Recipes

Benchmarks

Infrastructure

Extract structured data

Create a reusable schema

Extract with the saved schema

Use an inline schema for one-off work

Manage saved schemas

​Create a reusable schema

​Extract with the saved schema

​Use an inline schema for one-off work

​Manage saved schemas

Create a reusable schema

Extract with the saved schema

Use an inline schema for one-off work

Manage saved schemas