> ## Documentation Index
> Fetch the complete documentation index at: https://docs.mka1.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Rate limiting

> Per-key rate limits with configurable quotas per second, minute, hour, or day. Includes real HTTP 429 demonstration and retry patterns.

The MKA1 API enforces rate limits on a **per-key basis**.
Each API key has its own quota configured independently — requests per second, minute, hour, or day.
When a key exceeds its limit, the gateway returns `429 Too Many Requests` before the request reaches the model. No tokens are consumed and no usage is billed.

## Rate limit configuration

Rate limits are set per API key at provisioning time. The available granularities are:

| Window     | Example             |
| ---------- | ------------------- |
| Per second | 10 requests/second  |
| Per minute | 60 requests/minute  |
| Per hour   | 1,000 requests/hour |
| Per day    | 10,000 requests/day |

Limits can be combined — for example, 10 requests/second **and** 5,000 requests/hour on the same key.

## Demonstration: rate limiting in action

The following example uses a real API key configured with a rate limit of **1 request per minute**. The first request succeeds, and the second request — sent immediately after — is rejected with HTTP 429.

### Request 1 — succeeds

```bash theme={null}
curl https://apigw.mka1.com/api/v1/llm/responses \
  --request POST \
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer mk-YfcgFr...BsFLzg' \
  --data '{
    "model": "auto",
    "input": "Hello, how are you?",
    "stream": false
  }'
```

**Response: HTTP 200**

```json theme={null}
{
  "id": "resp_5ab19d58984940c88a971c6829ae201c",
  "object": "response",
  "status": "completed",
  "model": "auto",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [{ "type": "output_text", "text": "Hello! I'm doing well..." }]
    }
  ]
}
```

### Request 2 — rate limited (sent immediately after)

```bash theme={null}
curl https://apigw.mka1.com/api/v1/llm/responses \
  --request POST \
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer mk-YfcgFr...BsFLzg' \
  --data '{
    "model": "auto",
    "input": "This should be rate limited",
    "stream": false
  }'
```

**Response: HTTP 429**

```json theme={null}
{
  "error": "Rate limit exceeded"
}
```

The gateway rejects the request before it reaches the model — no tokens are consumed and no usage is billed.

### Programmatic example

<CodeGroup>
  ```bash CLI theme={null}
  # First request — succeeds (HTTP 200)
  mka1 llm responses create \
    --model auto \
    --input '"Hello"' \
    --include-headers \
    -H 'X-On-Behalf-Of: <end-user-id>'

  # Second request — rate limited (HTTP 429)
  mka1 llm responses create \
    --model auto \
    --input '"This will be rate limited"' \
    --include-headers \
    -H 'X-On-Behalf-Of: <end-user-id>'
  ```

  ```ts OpenAI SDK theme={null}
  import OpenAI from 'openai';

  const openai = new OpenAI({
    apiKey: '<mka1-api-key>',
    baseURL: 'https://apigw.mka1.com/api/v1/llm/',
  });

  // First request — succeeds
  const response = await openai.responses.create({
    model: 'auto',
    input: 'Hello',
    stream: false,
  });
  console.log(response.status); // "completed"

  // Second request — rate limited
  try {
    await openai.responses.create({
      model: 'auto',
      input: 'This will be rate limited',
      stream: false,
    });
  } catch (err) {
    if (err instanceof OpenAI.RateLimitError) {
      console.log(err.status);   // 429
      console.log(err.message);  // "429 Rate limit exceeded"
      console.log(err.error);    // { error: "Rate limit exceeded" }
    }
  }
  ```

  ```ts MKA1 SDK theme={null}
  import { SDK } from '@meetkai/mka1';
  import { APIError } from '@meetkai/mka1/models/errors';

  const mka1 = new SDK({
    bearerAuth: `Bearer <mka1-api-key>`,
  });

  // First request — succeeds
  const response = await mka1.llm.responses.create({
    model: 'auto',
    input: 'Hello',
  });
  console.log(response.status); // "completed"

  // Second request — rate limited
  try {
    await mka1.llm.responses.create({
      model: 'auto',
      input: 'This will be rate limited',
    });
  } catch (err) {
    if (err instanceof APIError && err.statusCode === 429) {
      console.log(err.statusCode); // 429
      console.log(err.body);       // '{"error":"Rate limit exceeded"}'
    }
  }
  ```

  ```csharp C# SDK theme={null}
  using MeetKai.MKA1;
  using MeetKai.MKA1.Types.Components;
  using MeetKai.MKA1.Types.Errors;

  var sdk = new SDK(bearerAuth: "Bearer YOUR_API_KEY");

  // First request — succeeds
  var response = await sdk.Llm.Responses.CreateAsync(new ResponsesCreateRequest()
  {
      Model = "auto",
      Input = ResponsesCreateRequestInput.CreateStr("Hello"),
  });

  // Second request — may be rate limited
  try
  {
      await sdk.Llm.Responses.CreateAsync(new ResponsesCreateRequest()
      {
          Model = "auto",
          Input = ResponsesCreateRequestInput.CreateStr("This will be rate limited"),
      });
  }
  catch (APIException ex) when ((int)ex.Response.StatusCode == 429)
  {
      Console.WriteLine(ex.Response.StatusCode); // TooManyRequests
      Console.WriteLine(ex.Body);               // "Rate limit exceeded"
  }
  ```

  ```python Python SDK theme={null}
  import time
  from mka1 import SDK
  from mka1.errors import SDKDefaultError

  sdk = SDK(bearer_auth="Bearer YOUR_API_KEY")

  try:
      res = sdk.llm.responses.create(
          model="auto",
          input="Summarize this document.",
      )
  except SDKDefaultError as e:
      if e.status_code == 429:
          time.sleep(int(e.headers.get("retry-after", 1)))
          res = sdk.llm.responses.create(
              model="auto",
              input="Summarize this document.",
          )
      else:
          raise
  ```

  ```bash bash theme={null}
  # First request — succeeds (HTTP 200)
  curl -s -w "\nHTTP %{http_code}" https://apigw.mka1.com/api/v1/llm/responses \
    --request POST \
    --header 'Content-Type: application/json' \
    --header 'Authorization: Bearer <mka1-api-key>' \
    --data '{ "model": "auto", "input": "Hello", "stream": false }'

  # Second request — rate limited (HTTP 429)
  curl -s -w "\nHTTP %{http_code}" https://apigw.mka1.com/api/v1/llm/responses \
    --request POST \
    --header 'Content-Type: application/json' \
    --header 'Authorization: Bearer <mka1-api-key>' \
    --data '{ "model": "auto", "input": "This will be rate limited", "stream": false }'
  ```
</CodeGroup>

## The 429 response

When a request is rate limited, the gateway returns:

| Field               | Value                                   |
| ------------------- | --------------------------------------- |
| **HTTP status**     | `429 Too Many Requests`                 |
| **Content-Type**    | `application/json`                      |
| **Body**            | `{"error": "Rate limit exceeded"}`      |
| **Tokens consumed** | 0 — the request never reaches the model |
| **Usage billed**    | None                                    |

The OpenAI SDK surfaces this as an `OpenAI.RateLimitError` with `err.status === 429`.
The MKA1 SDK throws an `APIError` with `err.statusCode === 429`.

## Handling rate limits in your application

When your application receives a 429 response, retry with exponential backoff:

```ts theme={null}
async function requestWithRetry(fn: () => Promise<any>, maxRetries = 3): Promise<any> {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await fn();
    } catch (err: any) {
      if (err.status === 429 && attempt < maxRetries - 1) {
        const waitMs = Math.pow(2, attempt) * 1000; // 1s, 2s, 4s
        await new Promise((r) => setTimeout(r, waitMs));
        continue;
      }
      throw err;
    }
  }
}

const response = await requestWithRetry(() =>
  openai.responses.create({ model: 'auto', input: 'Hello', stream: false })
);
```

## See also

* [Authentication](/docs/authentication) for API key setup and the `X-On-Behalf-Of` pattern.
* [Authorization](/docs/authorization) for resource-level access control.
* [Generate a response](/docs/generate-a-response) for the base Responses request shape.