Skip to main content
The MKA1 API enforces rate limits on a per-key basis. Each API key has its own quota configured independently — requests per second, minute, hour, or day. When a key exceeds its limit, the gateway returns 429 Too Many Requests before the request reaches the model. No tokens are consumed and no usage is billed.

Rate limit configuration

Rate limits are set per API key at provisioning time. The available granularities are:
WindowExample
Per second10 requests/second
Per minute60 requests/minute
Per hour1,000 requests/hour
Per day10,000 requests/day
Limits can be combined — for example, 10 requests/second and 5,000 requests/hour on the same key.

Demonstration: rate limiting in action

The following example uses a real API key configured with a rate limit of 1 request per minute. The first request succeeds, and the second request — sent immediately after — is rejected with HTTP 429.

Request 1 — succeeds

curl https://apigw.mka1.com/api/v1/llm/responses \
  --request POST \
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer mk-YfcgFr...BsFLzg' \
  --data '{
    "model": "auto",
    "input": "Hello, how are you?",
    "stream": false
  }'
Response: HTTP 200
{
  "id": "resp_5ab19d58984940c88a971c6829ae201c",
  "object": "response",
  "status": "completed",
  "model": "auto",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [{ "type": "output_text", "text": "Hello! I'm doing well..." }]
    }
  ]
}

Request 2 — rate limited (sent immediately after)

curl https://apigw.mka1.com/api/v1/llm/responses \
  --request POST \
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer mk-YfcgFr...BsFLzg' \
  --data '{
    "model": "auto",
    "input": "This should be rate limited",
    "stream": false
  }'
Response: HTTP 429
{
  "error": "Rate limit exceeded"
}
The gateway rejects the request before it reaches the model — no tokens are consumed and no usage is billed.

Programmatic example

import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: '<mka1-api-key>',
  baseURL: 'https://apigw.mka1.com/api/v1/llm/',
});

// First request — succeeds
const response = await openai.responses.create({
  model: 'auto',
  input: 'Hello',
  stream: false,
});
console.log(response.status); // "completed"

// Second request — rate limited
try {
  await openai.responses.create({
    model: 'auto',
    input: 'This will be rate limited',
    stream: false,
  });
} catch (err) {
  if (err instanceof OpenAI.RateLimitError) {
    console.log(err.status);   // 429
    console.log(err.message);  // "429 Rate limit exceeded"
    console.log(err.error);    // { error: "Rate limit exceeded" }
  }
}

The 429 response

When a request is rate limited, the gateway returns:
FieldValue
HTTP status429 Too Many Requests
Content-Typeapplication/json
Body{"error": "Rate limit exceeded"}
Tokens consumed0 — the request never reaches the model
Usage billedNone
The OpenAI SDK surfaces this as an OpenAI.RateLimitError with err.status === 429. The MKA1 SDK throws an APIError with err.statusCode === 429.

Handling rate limits in your application

When your application receives a 429 response, retry with exponential backoff:
async function requestWithRetry(fn: () => Promise<any>, maxRetries = 3): Promise<any> {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await fn();
    } catch (err: any) {
      if (err.status === 429 && attempt < maxRetries - 1) {
        const waitMs = Math.pow(2, attempt) * 1000; // 1s, 2s, 4s
        await new Promise((r) => setTimeout(r, waitMs));
        continue;
      }
      throw err;
    }
  }
}

const response = await requestWithRetry(() =>
  openai.responses.create({ model: 'auto', input: 'Hello', stream: false })
);

See also