429 Too Many Requests before the request reaches the model. No tokens are consumed and no usage is billed.
Rate limit configuration
Rate limits are set per API key at provisioning time. The available granularities are:| Window | Example |
|---|---|
| Per second | 10 requests/second |
| Per minute | 60 requests/minute |
| Per hour | 1,000 requests/hour |
| Per day | 10,000 requests/day |
Demonstration: rate limiting in action
The following example uses a real API key configured with a rate limit of 1 request per minute. The first request succeeds, and the second request — sent immediately after — is rejected with HTTP 429.Request 1 — succeeds
Request 2 — rate limited (sent immediately after)
Programmatic example
The 429 response
When a request is rate limited, the gateway returns:| Field | Value |
|---|---|
| HTTP status | 429 Too Many Requests |
| Content-Type | application/json |
| Body | {"error": "Rate limit exceeded"} |
| Tokens consumed | 0 — the request never reaches the model |
| Usage billed | None |
OpenAI.RateLimitError with err.status === 429.
The MKA1 SDK throws an APIError with err.statusCode === 429.
Handling rate limits in your application
When your application receives a 429 response, retry with exponential backoff:See also
- Authentication for API key setup and the
X-On-Behalf-Ofpattern. - Authorization for resource-level access control.
- Generate a response for the base Responses request shape.