Long-term memory

The history tool gives models long-term memory that persists across sessions. When enabled, every request-response pair is automatically stored and indexed. The model can then semantically search past interactions to recall information from earlier conversations.

How it works

Add { type: "history" } to the tools array in your request
The model receives a history function it can call with a search query
Past conversations are searched using vector embeddings for semantic similarity
After each response completes, the user message and assistant reply are stored automatically in the background

Memory is scoped per end-user — each X-On-Behalf-Of user ID gets an isolated history store. Different end-users cannot see each other’s history.

Enable the history tool

mka1 llm responses create \
  --model auto \
  --body '{
    "input": "Remember this: my favorite color is blue.",
    "tools": [{ "type": "history" }],
    "store": true
  }' \
  -H 'X-On-Behalf-Of: <end-user-id>'

Set store: true so the conversation is persisted and available for future recall.

Recall information from a previous session

In a later request — even minutes, hours, or days later — the model can search its history to find relevant past interactions. The model decides when to call the history tool based on the user’s question.

mka1 llm responses create \
  --model auto \
  --body '{
    "input": "What is my favorite color?",
    "tools": [{ "type": "history" }],
    "store": true
  }'

Full example: store and retrieve across sessions

This example shows the complete flow — storing information in one request and retrieving it in a separate request.

# Session 1: Tell the model something to remember
mka1 llm responses create \
  --model auto \
  --body '{
    "input": "Remember this: the project deadline is March 15th and the budget is $50,000.",
    "tools": [{ "type": "history" }],
    "store": true
  }'

# Session 2: Ask about it later
mka1 llm responses create \
  --model auto \
  --body '{
    "input": "What is the project deadline and budget?",
    "tools": [{ "type": "history" }],
    "store": true
  }'

Behavior details

Aspect	Detail
Storage	Automatic — each request/response pair is indexed after the response completes
Search	Semantic — uses vector embeddings, not keyword matching
Scope	Per end-user — isolated by `X-On-Behalf-Of` header
Indexing	Background — does not add latency to the response
Results	Up to 10 most relevant past interactions returned per search
Entry size	Text truncated to 7,500 characters per entry for embedding

When to use the history tool

Personalization: Remember user preferences, names, or context across sessions
Project continuity: Recall decisions, deadlines, or requirements discussed earlier
Support workflows: Maintain context about a user’s issue history
Assistants: Build assistants that learn and adapt to individual users over time

Next steps

Conversations — manage multi-turn exchanges within a single session
Files and vector stores — store and search documents
Generate a response — text requests and multi-turn exchanges

​How it works

​Enable the history tool

​Recall information from a previous session

​Full example: store and retrieve across sessions

​Behavior details

​When to use the history tool

​Next steps