Skip to main content
In MKA1, the practical tenant boundary is:
  • a distinct account API key
  • an optional delegated end-user ID via X-On-Behalf-Of
  • downstream resource ownership and usage recorded against that authenticated context
Use the walkthrough below when you want to verify that two tenants are isolated from each other in practice. It gives you an exact Tenant A / Tenant B flow you can run end to end.

Verify tenant isolation in practice

Treat each tenant as a separate account with its own API key. The goal is to show four things:
  1. Tenant A and Tenant B use different API keys.
  2. Tenant A and Tenant B can have different quota settings.
  3. Tenant A and Tenant B can have different policies.
  4. A resource created under Tenant A is not accessible under Tenant B.

Step 1: define two tenants

Start with two separate API keys.
export TENANT_A_KEY="<tenant-a-api-key>"
export TENANT_B_KEY="<tenant-b-api-key>"
export TENANT_A_USER="tenant_a_user"
export TENANT_B_USER="tenant_b_user"
If you already have two production or staging keys, you can use them directly. If you need to provision two keys with different limits for the demonstration, create them separately.
curl https://apigw.mka1.com/api/v1/authentication/api-key/create/rate-limited \
  --request POST \
  --header 'Content-Type: application/json' \
  --header 'Cookie: better-auth.session_token=<tenant-a-session>' \
  --data '{
    "name": "Tenant A demo key",
    "rateLimitEnabled": true,
    "rateLimitTimeWindow": 3600000,
    "rateLimitMax": 60
  }'
curl https://apigw.mka1.com/api/v1/authentication/api-key/create/rate-limited \
  --request POST \
  --header 'Content-Type: application/json' \
  --header 'Cookie: better-auth.session_token=<tenant-b-session>' \
  --data '{
    "name": "Tenant B demo key",
    "rateLimitEnabled": true,
    "rateLimitTimeWindow": 3600000,
    "rateLimitMax": 600
  }'
This establishes the first two pieces of isolation immediately:
  • Tenant A and Tenant B do not share a key.
  • Tenant A and Tenant B do not share the same quota configuration.

Step 2: configure different policies for each tenant

Now give the two tenants different guardrail policies. Configure Tenant A to block the word confidential.
curl https://apigw.mka1.com/api/v1/llm/guardrails \
  --request PUT \
  --header 'Content-Type: application/json' \
  --header "Authorization: Bearer ${TENANT_A_KEY}" \
  --data '{
    "guardrails": [
      {
        "mode": "ban_words",
        "enabled": true,
        "config": {
          "words": ["confidential"]
        }
      }
    ]
  }'
Configure Tenant B with a different policy.
curl https://apigw.mka1.com/api/v1/llm/guardrails \
  --request PUT \
  --header 'Content-Type: application/json' \
  --header "Authorization: Bearer ${TENANT_B_KEY}" \
  --data '{
    "guardrails": [
      {
        "mode": "ban_words",
        "enabled": true,
        "config": {
          "words": ["internal-only"]
        }
      }
    ]
  }'
Then test the same content against both tenants. Tenant A should block this:
curl https://apigw.mka1.com/api/v1/llm/guardrails/test \
  --request POST \
  --header 'Content-Type: application/json' \
  --header "Authorization: Bearer ${TENANT_A_KEY}" \
  --data '{
    "content": "Summarize this confidential roadmap."
  }'
Tenant B should evaluate the same content against its own policy instead of Tenant A’s:
curl https://apigw.mka1.com/api/v1/llm/guardrails/test \
  --request POST \
  --header 'Content-Type: application/json' \
  --header "Authorization: Bearer ${TENANT_B_KEY}" \
  --data '{
    "content": "Summarize this confidential roadmap."
  }'
The expected result is:
  • Tenant A fails because confidential is blocked in Tenant A’s policy.
  • Tenant B does not inherit Tenant A’s policy.
This is the clearest practical demonstration of independent policies per tenant.

Step 3: demonstrate separate quotas

After creating two keys with different limits, send the same type of request through both keys. Tenant A uses the low-limit key:
for i in $(seq 1 70); do
  curl https://apigw.mka1.com/api/v1/llm/responses \
    --request POST \
    --silent \
    --output /dev/null \
    --write-out "tenant-a request $i -> %{http_code}\n" \
    --header 'Content-Type: application/json' \
    --header "Authorization: Bearer ${TENANT_A_KEY}" \
    --header "X-On-Behalf-Of: ${TENANT_A_USER}" \
    --data '{
      "model": "openai:gpt-5-mini",
      "input": "Reply with the word ok."
    }'
done
Tenant B uses the higher-limit key:
for i in $(seq 1 70); do
  curl https://apigw.mka1.com/api/v1/llm/responses \
    --request POST \
    --silent \
    --output /dev/null \
    --write-out "tenant-b request $i -> %{http_code}\n" \
    --header 'Content-Type: application/json' \
    --header "Authorization: Bearer ${TENANT_B_KEY}" \
    --header "X-On-Behalf-Of: ${TENANT_B_USER}" \
    --data '{
      "model": "openai:gpt-5-mini",
      "input": "Reply with the word ok."
    }'
done
The expected result is:
  • Tenant A starts receiving 429 earlier.
  • Tenant B continues to succeed because it has a different key and a different quota.
This shows that the tenants are not sharing one global limiter.

Step 4: demonstrate resource isolation

Create a resource under Tenant A. A conversation is an easy example because it is visible through the public API.
curl https://apigw.mka1.com/api/v1/llm/conversations \
  --request POST \
  --header 'Content-Type: application/json' \
  --header "Authorization: Bearer ${TENANT_A_KEY}" \
  --header "X-On-Behalf-Of: ${TENANT_A_USER}" \
  --data '{
    "metadata": {
      "tenant": "A",
      "demo": "segregated-tenants"
    }
  }'
Assume the response returns conv_tenant_a_123. Tenant A can read it:
curl https://apigw.mka1.com/api/v1/llm/conversations/conv_tenant_a_123 \
  --header "Authorization: Bearer ${TENANT_A_KEY}" \
  --header "X-On-Behalf-Of: ${TENANT_A_USER}"
Tenant B should not be able to read the same resource:
curl https://apigw.mka1.com/api/v1/llm/conversations/conv_tenant_a_123 \
  --header "Authorization: Bearer ${TENANT_B_KEY}" \
  --header "X-On-Behalf-Of: ${TENANT_B_USER}"
You can also show list isolation:
curl 'https://apigw.mka1.com/api/v1/llm/conversations?limit=20' \
  --header "Authorization: Bearer ${TENANT_B_KEY}" \
  --header "X-On-Behalf-Of: ${TENANT_B_USER}"
Tenant B should not see Tenant A’s conversation in its list. This is the practical tenant-enclave proof:
  • the same API surface is used
  • the same platform is used
  • but identity, policy, quota, and resource ownership are enforced separately

Optional Step 5: show separate usage ledgers

If you want one more visible proof point, query usage separately per tenant.
curl 'https://apigw.mka1.com/api/v1/llm/usage/responses?start_time=1704067200&end_time=1704153600&bucket_width=1d&group_by=api_key_id' \
  --header "Authorization: Bearer ${TENANT_A_KEY}"
curl 'https://apigw.mka1.com/api/v1/llm/usage/responses?start_time=1704067200&end_time=1704153600&bucket_width=1d&group_by=api_key_id' \
  --header "Authorization: Bearer ${TENANT_B_KEY}"
The point of this step is not just billing. It shows that the platform records activity against the authenticated tenant context instead of merging all traffic into one undifferentiated account.

What this walkthrough proves

If you run the walkthrough above, you can make these exact claims:
  • Independent keys: Tenant A and Tenant B authenticate with different API keys.
  • Independent quotas: Tenant A and Tenant B can have different rate-limit settings and receive different 429 behavior.
  • Independent policies: Tenant A and Tenant B can set different guardrails and get different results on the same content.
  • Real isolation: A resource created under Tenant A is not readable under Tenant B.
The API surface is shared, but the authenticated tenant context, quota enforcement, policy configuration, and resource ownership are not shared.

How the identity model works underneath

Authentication in the MKA1 API has three layers:
  • Your API key identifies your account.
  • X-On-Behalf-Of identifies the end user your server is acting for.
  • An exchanged JWT gives a downstream service a short-lived credential derived from that API key and end-user context.
The short version is simple:
  • Send Authorization: Bearer <mka1-api-key> on every server-side request.
  • Add X-On-Behalf-Of when the request belongs to one of your end users.
  • Use POST /api/v1/authentication/api-key/exchange-token when another service should receive a short-lived token instead of your raw API key.
If you only need the usage pattern, start with the authentication guide. This page explains what happens after the request reaches the gateway.

The request path

The MKA1 API does not ask downstream services to validate bearer tokens on their own. Requests pass through the gateway first, and the gateway injects trusted identity headers for the rest of the platform. When you also send X-On-Behalf-Of, the gateway keeps that delegated end-user identity with the request: JWT exchange adds one extra step:

The three core patterns

Pattern 1: backend-only requests

Use this when your backend is calling the MKA1 API for its own workflow and there is no separate end user to track.
curl https://apigw.mka1.com/api/v1/llm/responses \
  --request POST \
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer <mka1-api-key>' \
  --data '{
    "model": "openai:gpt-5-mini",
    "input": "Write a short release note for our API update."
  }'
const response = await fetch("https://apigw.mka1.com/api/v1/llm/responses", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    Authorization: `Bearer ${process.env.MKA1_API_KEY}`,
  },
  body: JSON.stringify({
    model: "openai:gpt-5-mini",
    input: "Write a short release note for our API update.",
  }),
});

Pattern 2: multi-user server integration

Use this when your backend is making the request for one of your own application users.
curl https://apigw.mka1.com/api/v1/llm/responses \
  --request POST \
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer <mka1-api-key>' \
  --header 'X-On-Behalf-Of: user_123' \
  --data '{
    "model": "openai:gpt-5-mini",
    "input": "Summarize this support ticket."
  }'
const response = await fetch("https://apigw.mka1.com/api/v1/llm/responses", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    Authorization: `Bearer ${process.env.MKA1_API_KEY}`,
    "X-On-Behalf-Of": "user_123",
  },
  body: JSON.stringify({
    model: "openai:gpt-5-mini",
    input: "Summarize this support ticket.",
  }),
});
This is the right pattern when you want requests, files, responses, conversations, and usage to stay associated with a stable end-user ID from your own system.

Pattern 3: exchange your API key for a short-lived JWT

Use this when another service should receive a time-limited token instead of your long-lived API key.
curl https://apigw.mka1.com/api/v1/authentication/api-key/exchange-token \
  --request POST \
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer <mka1-api-key>' \
  --data '{
    "audience": "https://my-service.example.com",
    "externalUserId": "user_123",
    "expiresIn": 3600,
    "permissions": ["agent:create", "agent:read"]
  }'
const response = await fetch("https://apigw.mka1.com/api/v1/authentication/api-key/exchange-token", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    Authorization: `Bearer ${process.env.MKA1_API_KEY}`,
  },
  body: JSON.stringify({
    audience: "https://my-service.example.com",
    externalUserId: "user_123",
    expiresIn: 3600,
    permissions: ["agent:create", "agent:read"],
  }),
});

const { token } = await response.json();

Which headers you send vs which headers the platform injects

Headers you send

HeaderRequiredMeaning
AuthorizationYesYour MKA1 API key or exchanged JWT in bearer format
X-On-Behalf-OfNoYour stable end-user identifier for delegated server-side requests

Trusted headers injected inside the platform

HeaderProduced byMeaning
X-User-IDGatewayThe authenticated account user ID
X-Api-Key-IDGatewayThe resolved API key ID
X-User-RoleGatewayThe authenticated role context
X-Api-Key-PermissionsGatewayPermissions attached to the API key
X-Exchange-JWT-External-User-IDGatewayThe delegated end-user ID
X-Exchange-JWT-PermissionsGatewayPermissions embedded in the exchanged JWT
Clients send Authorization and sometimes X-On-Behalf-Of. Clients do not send the internal X-User-ID or X-Api-Key-ID headers directly. Those are derived by the gateway after validation.

How JWT exchange works

POST /api/v1/authentication/api-key/exchange-token turns a long-lived API key into a short-lived token for another service. The request body has four meaningful fields:
  • audience: the exact service URL that should accept the JWT
  • externalUserId: the end-user ID placed into the JWT subject
  • expiresIn: token lifetime in seconds, from 300 to 2592000
  • permissions: an optional subset of the API key permissions
The response is:
{
  "token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9..."
}
A decoded token payload looks like this:
{
  "ak": "cm4xqw8ij0001w8lcmj9tn6v7",
  "sub": "user_123",
  "iat": 1734624000,
  "exp": 1734627600,
  "iss": "https://exchange.meetkai.com",
  "aud": "https://my-service.example.com",
  "permissions": ["agent:create", "agent:read"]
}
These claims matter most:
  • ak: the API key ID used for lookup and rate-limit enforcement
  • sub: your delegated end-user identity
  • aud: the service that should accept the token
  • permissions: the allowed capability set for that token

How tenant scoping works in practice

The most important design choice is that end-user identity is explicit. If you send X-On-Behalf-Of: user_123, downstream services can keep resources and usage associated with that user. This matters for:
  • saved conversations
  • stored responses
  • files and vector stores
  • per-user usage and audit trails
  • delegated authorization checks
When you omit X-On-Behalf-Of, requests run as backend-only account work. When you include it, the request becomes a delegated request for a specific end user. Use a stable identifier from your own system. Do not use a mutable display name unless that is already your canonical user ID.

Rate limits and what callers should expect

API keys can carry custom rate-limit settings. The auth path enforces rate limits per API key before the request reaches downstream services. In practice, this means:
  • two different API keys can have different limits
  • one overloaded API key does not imply another key is exhausted
  • a 429 is part of the auth path, not a downstream model error
The exchange-token flow also stays tied to the originating API key. That means a downstream JWT still inherits the account-level rate-limit behavior associated with the source key. If you want separate traffic pools for different applications or workloads, use separate API keys.

Common mistakes

Sending X-On-Behalf-Of from browser code

Do not expose your API key in browser or mobile client code. Your server should call the MKA1 API and attach X-On-Behalf-Of there.

Using an unstable end-user identifier

Use a durable internal ID like user_123. Do not switch between email addresses, usernames, and display names for the same user.

Using the wrong audience in exchanged JWTs

The downstream service should validate the token for the intended audience. Set audience to the actual service URL that should accept the JWT.

Requesting permissions broader than the API key

The exchange-token endpoint only allows a subset of the API key permissions. If you request a permission the API key does not have, the exchange fails.

Treating internal propagated headers as public client headers

Headers like X-User-ID and X-Api-Key-ID are part of the trusted internal request path. They are not a substitute for Authorization.

Code-path appendix

The following snippets show the main implementation shape behind the public behavior.

Kong validates the bearer token and injects trusted headers

authorizationHeader = utils.StripBearer(authorizationHeader)

if utils.IsApiKey(authorizationHeader) {
    resp, err := c.authenticationClient.ValidateApiKey(ctx, authorizationHeader)
    if err != nil {
        c.handleAuthenticationServiceError(kong, err)
        return
    }

    c.writeApiKeyInfoToUnderlyingHeaders(kong, resp.GetApiKeyInfo())
    c.writeUserInfoToUnderlyingHeaders(kong, resp.GetUserInfo())

    onBehalfOfHeader, _ := kong.Request.GetHeader(headerOnBehalfOf)
    if onBehalfOfHeader != "" {
        c.writeExchangeJwtInfoToUnderlyingHeaders(kong, &authentication.ExchangeJWTInfo{
            ExternalUserId: onBehalfOfHeader,
            Permissions:    resp.GetApiKeyInfo().GetPermissions(),
        })
    }

    return
}
That is why downstream services can trust X-User-ID, X-Api-Key-ID, and X-Exchange-JWT-External-User-ID after gateway validation.

The exchange-token endpoint verifies the API key, checks permissions, and signs the JWT

const apiKeyResult = await auth.api.verifyApiKey({
  body: {
    key: apiKey,
  },
});

if (!apiKeyResult.valid || !apiKeyResult.key) {
  ctx.set.status = 401;
  return { message: "Invalid API key" };
}

const apiKeyPermissions = Object.entries(verifiedKey.permissions ?? {}).map(
  ([key, value]) => `${key}:${value}`,
);
const permissions = validatedBody.permissions ?? apiKeyPermissions;

if (permissions.some((permission) => !apiKeyPermissions.includes(permission))) {
  ctx.set.status = 401;
  return { message: "Permissions mismatch" };
}
Then it signs a payload that includes the API key ID, delegated end-user subject, and filtered permissions:
payload: {
  ak: verifiedKey.id,
  sub: validatedBody.externalUserId,
  iat: DateTime.now().toUTC().toUnixInteger(),
  permissions: permissions,
}

Downstream services read identity from trusted headers

const apiKey = reqHeaders.get(KONG_API_KEY_HEADER)?.trim();
const userId = reqHeaders.get(KONG_USER_ID_HEADER)?.trim();
const externalUserId = reqHeaders.get(KONG_EXTERNAL_USER_ID_HEADER)?.trim() ?? "";

if (!apiKey?.length || !userId?.length) {
  throw KONG_AUTH_ERROR;
}

return { apiKey, userId, externalUserId, userRole };
This is why services behind the gateway usually do not parse bearer tokens themselves. They operate on a validated identity context instead.

Next steps

Use the authentication guide for the short version and copy-paste examples. Use the API overview to inspect the generated reference and the live OpenAPI spec. Then review the relevant endpoint pages in the API Reference for exact request and response details.