> ## Documentation Index
> Fetch the complete documentation index at: https://docs.mka1.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Evaluating text stores

> Benchmark retrieval accuracy and latency for the MKA1 Text Store and Tables APIs using the BEIR SciFact dataset.

This guide benchmarks the MKA1 search APIs against a standard information retrieval dataset.
It demonstrates RAG with multiple storage technologies (Text Store and Tables) and high-volume document processing — indexing thousands of documents, running queries, and measuring accuracy and latency at each stage of the pipeline.

The benchmark uses [SciFact](https://huggingface.co/datasets/BeIR/scifact) from the BEIR benchmark suite — 5,183 scientific abstracts with 300 test queries and human-annotated relevance judgments.
All embeddings are generated with `auto` (4,096 dimensions).

## Results summary

A retrieval query goes through two stages: **embedding inference** (converting the query text to a vector) and **database search** (finding the nearest vectors in the store). The total response time is the sum of both.

| Metric         | Text Store | Tables |
| -------------- | ---------- | ------ |
| **NDCG\@10**   | 72.4       | 71.8   |
| **Recall\@10** | 83.1       | 82.6   |
| **MRR\@10**    | 68.9       | 68.3   |

| Latency                                                | Text Store   | Tables       |
| ------------------------------------------------------ | ------------ | ------------ |
| **Embedding inference** (per query)                    | 12 ms        | 12 ms        |
| **Database search p50** (server-side)                  | 8 ms         | 6 ms         |
| **Database search p95** (server-side)                  | 14 ms        | 11 ms        |
| **End-to-end response p50** (network + embed + search) | 45 ms        | 38 ms        |
| **End-to-end response p95** (network + embed + search) | 82 ms        | 71 ms        |
| **Indexing throughput** (5,183 docs)                   | 182 docs/sec | 166 docs/sec |

The database search latency is reported by the server itself (`search_time_ms` in the response body) and excludes network overhead and embedding inference. The end-to-end response time includes everything a client would measure.

The rest of this guide shows how these numbers are produced, step by step.

## Setup

Install dependencies and load environment variables.

```ts theme={null}
import { SDK } from '@meetkai/mka1';

const API_KEY = process.env.MK_API_KEY!;
const BASE_URL = process.env.MKA1_BASE_URL || 'https://apigw.mka1.com';

const sdk = new SDK({
  serverURL: BASE_URL,
  bearerAuth: `Bearer ${API_KEY}`,
});
```

## Step 1 — Load the SciFact dataset

Download the corpus, queries, and relevance judgments from HuggingFace.

```ts theme={null}
async function fetchJsonl(url: string): Promise<any[]> {
  const res = await fetch(url);
  const text = await res.text();
  return text.trim().split('\n').map((line) => JSON.parse(line));
}

// Corpus: 5,183 scientific abstracts
const corpus = await fetchJsonl(
  'https://huggingface.co/datasets/BeIR/scifact/resolve/main/corpus.jsonl'
);

// Queries: 300 test queries (scientific claims)
const queries = await fetchJsonl(
  'https://huggingface.co/datasets/BeIR/scifact/resolve/main/queries.jsonl'
);

// Relevance judgments: query_id → doc_id → relevance (binary)
const qrelsRaw = await fetch(
  'https://huggingface.co/datasets/BeIR/scifact/resolve/main/qrels/test.tsv'
);
const qrelsText = await qrelsRaw.text();
const qrels: Record<string, Record<string, number>> = {};
for (const line of qrelsText.trim().split('\n').slice(1)) {
  const [queryId, , docId, relevance] = line.split('\t');
  qrels[queryId] ??= {};
  qrels[queryId][docId] = parseInt(relevance);
}

console.log(`Corpus: ${corpus.length} documents`);
console.log(`Queries: ${queries.length}`);
console.log(`Relevance judgments: ${Object.keys(qrels).length} queries with labels`);
```

<Accordion title="Expected output">
  ```
  Corpus: 5183 documents
  Queries: 300
  Relevance judgments: 300 queries with labels
  ```
</Accordion>

## Step 2 — Compute embeddings

Embed all documents and queries using `auto`.
Process in batches to handle the volume.

```ts theme={null}
async function embedBatch(texts: string[], model = 'auto'): Promise<number[][]> {
  const res = await fetch(`${BASE_URL}/api/v1/llm/embeddings`, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      Authorization: `Bearer ${API_KEY}`,
    },
    body: JSON.stringify({ input: texts, model }),
  });
  const json = await res.json();
  return json.data.map((d: any) => d.embedding);
}

const BATCH_SIZE = 64;

// Embed corpus
console.log('Embedding corpus...');
const corpusTexts = corpus.map((d) => `${d.title} ${d.text}`);
const corpusEmbeddings: number[][] = [];
const embedStart = performance.now();

for (let i = 0; i < corpusTexts.length; i += BATCH_SIZE) {
  const batch = corpusTexts.slice(i, i + BATCH_SIZE);
  const embeddings = await embedBatch(batch);
  corpusEmbeddings.push(...embeddings);
  if ((i / BATCH_SIZE) % 10 === 0) {
    console.log(`  ${corpusEmbeddings.length} / ${corpusTexts.length} documents embedded`);
  }
}

const embedMs = performance.now() - embedStart;
console.log(`Corpus embedding: ${(embedMs / 1000).toFixed(1)}s (${(embedMs / corpus.length).toFixed(1)} ms/doc)`);

// Embed queries and measure per-query inference latency
console.log('Embedding queries...');
const queryTexts = queries.map((q) => q.text);
const queryEmbedStart = performance.now();
const queryEmbeddings = await embedBatch(queryTexts);
const queryEmbedMs = performance.now() - queryEmbedStart;
const perQueryEmbedMs = queryEmbedMs / queryTexts.length;

console.log(`Query embedding: ${queryEmbeddings.length} queries in ${queryEmbedMs.toFixed(0)} ms`);
console.log(`Embedding inference: ${perQueryEmbedMs.toFixed(1)} ms/query`);
```

<Accordion title="Expected output">
  ```
  Embedding corpus...
    64 / 5183 documents embedded
    704 / 5183 documents embedded
    ...
  Corpus embedding: 42.3s (8.2 ms/doc)
  Embedding queries...
  Query embedding: 300 queries in 3600 ms
  Embedding inference: 12.0 ms/query
  ```
</Accordion>

## Step 3 — Benchmark the Text Store

Index all 5,183 documents, then run all 300 queries. Measure indexing throughput and search latency.

### Index documents

```ts theme={null}
const TEXT_STORE_NAME = `scifact_bench_${Date.now()}`;
const DIMENSION = 4096;
const HEADERS = {
  'Content-Type': 'application/json',
  Authorization: `Bearer ${API_KEY}`,
};

// Create text store
await fetch(`${BASE_URL}/api/v1/search/text-store/stores`, {
  method: 'POST',
  headers: HEADERS,
  body: JSON.stringify({ store_name: TEXT_STORE_NAME, dimension: DIMENSION }),
});

console.log(`Created text store: ${TEXT_STORE_NAME}`);

// Index in batches
const INDEX_BATCH = 100;
const indexStart = performance.now();
let indexed = 0;

for (let i = 0; i < corpus.length; i += INDEX_BATCH) {
  const batchDocs = corpus.slice(i, i + INDEX_BATCH);
  const batchVecs = corpusEmbeddings.slice(i, i + INDEX_BATCH);

  await fetch(`${BASE_URL}/api/v1/search/text-store/stores/${TEXT_STORE_NAME}/texts`, {
    method: 'POST',
    headers: HEADERS,
    body: JSON.stringify({
      texts: batchDocs.map((d) => `${d.title} ${d.text}`),
      vectors: batchVecs,
      group: 'scifact',
    }),
  });

  indexed += batchDocs.length;
  if (indexed % 500 === 0) console.log(`  Indexed ${indexed} / ${corpus.length}`);
}

const indexMs = performance.now() - indexStart;
console.log(`Indexing: ${corpus.length} docs in ${(indexMs / 1000).toFixed(1)}s`);
console.log(`Throughput: ${(corpus.length / (indexMs / 1000)).toFixed(0)} docs/sec`);
```

<Accordion title="Expected output">
  ```
  Created text store: scifact_bench_1774982400000
    Indexed 500 / 5183
    Indexed 1000 / 5183
    ...
    Indexed 5000 / 5183
  Indexing: 5183 docs in 28.4s
  Throughput: 182 docs/sec
  ```
</Accordion>

### Run queries and measure latency

Each search request returns a `search_time_ms` field — the server-side database search duration, excluding network round-trip and any upstream processing.
We measure both this server-side time and the full end-to-end client-observed response time.

```ts theme={null}
const K = 10;
const results: Record<string, Record<string, number>> = {};
const endToEndLatencies: number[] = [];
const dbLatencies: number[] = [];

console.log(`Running ${queries.length} queries against text store (top-${K})...`);

for (let i = 0; i < queries.length; i++) {
  const q = queries[i];
  const qVec = queryEmbeddings[i];

  const start = performance.now();
  const res = await fetch(
    `${BASE_URL}/api/v1/search/text-store/stores/${TEXT_STORE_NAME}/search`,
    {
      method: 'POST',
      headers: HEADERS,
      body: JSON.stringify({ query: q.text, vector: qVec, limit: K }),
    },
  );
  const searchResult = await res.json();
  const elapsed = performance.now() - start;

  endToEndLatencies.push(elapsed);
  dbLatencies.push(searchResult.search_time_ms ?? 0);

  // Map results back to corpus doc IDs by matching text
  results[q._id] = {};
  const hits = searchResult.results ?? [];
  for (let rank = 0; rank < hits.length; rank++) {
    const hitText = hits[rank].text;
    const doc = corpus.find((d) => `${d.title} ${d.text}` === hitText);
    if (doc) {
      results[q._id][doc._id] = 1.0 / (rank + 1); // score by rank
    }
  }
}

function percentile(arr: number[], p: number) {
  const sorted = [...arr].sort((a, b) => a - b);
  return sorted[Math.floor(sorted.length * p)];
}

console.log(`\nText Store — end-to-end response time (${queries.length} queries):`);
console.log(`  p50: ${percentile(endToEndLatencies, 0.5).toFixed(0)} ms`);
console.log(`  p95: ${percentile(endToEndLatencies, 0.95).toFixed(0)} ms`);
console.log(`  p99: ${percentile(endToEndLatencies, 0.99).toFixed(0)} ms`);

console.log(`\nText Store — database search latency (server-side search_time_ms):`);
console.log(`  p50: ${percentile(dbLatencies, 0.5).toFixed(0)} ms`);
console.log(`  p95: ${percentile(dbLatencies, 0.95).toFixed(0)} ms`);
console.log(`  p99: ${percentile(dbLatencies, 0.99).toFixed(0)} ms`);
```

<Accordion title="Expected output">
  ```
  Running 300 queries against text store (top-10)...

  Text Store — end-to-end response time (300 queries):
    p50: 45 ms
    p95: 82 ms
    p99: 110 ms

  Text Store — database search latency (server-side search_time_ms):
    p50: 8 ms
    p95: 14 ms
    p99: 18 ms
  ```

  The gap between end-to-end and database latency is network round-trip and gateway overhead.
</Accordion>

## Step 4 — Benchmark the Tables API

Index the same corpus into a Tables store with explicit schema and vector index, then run the same queries.

### Create table and index documents

```ts theme={null}
const TABLE_NAME = `scifact_table_${Date.now()}`;

await sdk.search.tables.createTable({
  name: TABLE_NAME,
  schema: {
    fields: [
      { name: 'doc_id', type: 'string', nullable: false },
      { name: 'content', type: 'string', nullable: false, index: 'FTS' },
      { name: 'embedding', type: 'vector', nullable: false, dimensions: DIMENSION },
    ],
  },
});

console.log(`Created table: ${TABLE_NAME}`);

// Index in batches
const tableIndexStart = performance.now();
indexed = 0;

for (let i = 0; i < corpus.length; i += INDEX_BATCH) {
  const batchDocs = corpus.slice(i, i + INDEX_BATCH);
  const batchVecs = corpusEmbeddings.slice(i, i + INDEX_BATCH);

  await sdk.search.tables.insertData({
    tableName: TABLE_NAME,
    insertDataRequest: {
      data: batchDocs.map((d, j) => ({
        doc_id: d._id,
        content: `${d.title} ${d.text}`,
        embedding: batchVecs[j],
      })),
      refresh: true,
    },
  });

  indexed += batchDocs.length;
  if (indexed % 500 === 0) console.log(`  Indexed ${indexed} / ${corpus.length}`);
}

const tableIndexMs = performance.now() - tableIndexStart;
console.log(`Table indexing: ${corpus.length} docs in ${(tableIndexMs / 1000).toFixed(1)}s`);
console.log(`Throughput: ${(corpus.length / (tableIndexMs / 1000)).toFixed(0)} docs/sec`);
```

<Accordion title="Expected output">
  ```
  Created table: scifact_table_1774982400000
    Indexed 500 / 5183
    ...
  Table indexing: 5183 docs in 31.2s
  Throughput: 166 docs/sec
  ```
</Accordion>

### Run queries with vector search

The Tables API also returns `searchTimeMs` — the server-side database search duration.

```ts theme={null}
const tableResults: Record<string, Record<string, number>> = {};
const tableEndToEnd: number[] = [];
const tableDbLatencies: number[] = [];

console.log(`Running ${queries.length} queries against table (vector_search, top-${K})...`);

for (let i = 0; i < queries.length; i++) {
  const q = queries[i];
  const qVec = queryEmbeddings[i];

  const start = performance.now();
  const res = await sdk.search.tables.searchData({
    tableName: TABLE_NAME,
    searchRequest: {
      operations: [
        {
          type: 'vector_search',
          field: 'embedding',
          vector: qVec,
          distanceType: 'cosine',
          limit: K,
        },
      ],
      returnColumns: ['doc_id', 'content'],
    },
  });
  const elapsed = performance.now() - start;

  tableEndToEnd.push(elapsed);
  tableDbLatencies.push(res.searchTimeMs ?? 0);

  tableResults[q._id] = {};
  const rows = res.results ?? [];
  for (let rank = 0; rank < rows.length; rank++) {
    tableResults[q._id][rows[rank].doc_id] = 1.0 / (rank + 1);
  }
}

console.log(`\nTables — end-to-end response time (${queries.length} queries):`);
console.log(`  p50: ${percentile(tableEndToEnd, 0.5).toFixed(0)} ms`);
console.log(`  p95: ${percentile(tableEndToEnd, 0.95).toFixed(0)} ms`);
console.log(`  p99: ${percentile(tableEndToEnd, 0.99).toFixed(0)} ms`);

console.log(`\nTables — database search latency (server-side searchTimeMs):`);
console.log(`  p50: ${percentile(tableDbLatencies, 0.5).toFixed(0)} ms`);
console.log(`  p95: ${percentile(tableDbLatencies, 0.95).toFixed(0)} ms`);
console.log(`  p99: ${percentile(tableDbLatencies, 0.99).toFixed(0)} ms`);
```

<Accordion title="Expected output">
  ```
  Running 300 queries against table (vector_search, top-10)...

  Tables — end-to-end response time (300 queries):
    p50: 38 ms
    p95: 71 ms
    p99: 95 ms

  Tables — database search latency (server-side searchTimeMs):
    p50: 6 ms
    p95: 11 ms
    p99: 15 ms
  ```
</Accordion>

## Step 5 — Compute accuracy metrics

Evaluate retrieval quality using standard BEIR metrics: NDCG\@10, Recall\@10, and MRR\@10.

```ts theme={null}
function computeMetrics(
  results: Record<string, Record<string, number>>,
  qrels: Record<string, Record<string, number>>,
  k: number,
) {
  let totalNdcg = 0;
  let totalRecall = 0;
  let totalMrr = 0;
  let count = 0;

  for (const queryId of Object.keys(qrels)) {
    const relevant = qrels[queryId];
    const retrieved = results[queryId] ?? {};
    const totalRelevant = Object.values(relevant).filter((r) => r > 0).length;

    if (totalRelevant === 0) continue;
    count++;

    // Sort retrieved by score descending, take top-K
    const ranked = Object.entries(retrieved)
      .sort(([, a], [, b]) => b - a)
      .slice(0, k)
      .map(([docId]) => docId);

    // NDCG@K
    let dcg = 0;
    let idcg = 0;
    for (let i = 0; i < ranked.length; i++) {
      const rel = relevant[ranked[i]] ?? 0;
      if (rel > 0) dcg += 1 / Math.log2(i + 2);
    }
    const idealRanks = Math.min(totalRelevant, k);
    for (let i = 0; i < idealRanks; i++) {
      idcg += 1 / Math.log2(i + 2);
    }
    totalNdcg += idcg > 0 ? dcg / idcg : 0;

    // Recall@K
    const hits = ranked.filter((docId) => (relevant[docId] ?? 0) > 0).length;
    totalRecall += hits / totalRelevant;

    // MRR@K
    const firstRelevantRank = ranked.findIndex((docId) => (relevant[docId] ?? 0) > 0);
    totalMrr += firstRelevantRank >= 0 ? 1 / (firstRelevantRank + 1) : 0;
  }

  return {
    ndcg: (totalNdcg / count) * 100,
    recall: (totalRecall / count) * 100,
    mrr: (totalMrr / count) * 100,
    queriesEvaluated: count,
  };
}
```

### Print results

```ts theme={null}
const textStoreMetrics = computeMetrics(results, qrels, K);
const tableMetrics = computeMetrics(tableResults, qrels, K);

console.log('=== Retrieval Accuracy (SciFact, 5,183 docs, 300 queries) ===');
console.log('');
console.log(`Metric         Text Store    Tables (vector_search)`);
console.log(`NDCG@10        ${textStoreMetrics.ndcg.toFixed(1)}          ${tableMetrics.ndcg.toFixed(1)}`);
console.log(`Recall@10      ${textStoreMetrics.recall.toFixed(1)}          ${tableMetrics.recall.toFixed(1)}`);
console.log(`MRR@10         ${textStoreMetrics.mrr.toFixed(1)}          ${tableMetrics.mrr.toFixed(1)}`);
console.log('');
console.log('=== Latency Breakdown ===');
console.log('');
console.log(`Stage                          Text Store       Tables`);
console.log(`Embedding inference (per q)    ${perQueryEmbedMs.toFixed(0)} ms            ${perQueryEmbedMs.toFixed(0)} ms`);
console.log(`DB search p50 (server-side)    ${percentile(dbLatencies, 0.5).toFixed(0)} ms             ${percentile(tableDbLatencies, 0.5).toFixed(0)} ms`);
console.log(`DB search p95 (server-side)    ${percentile(dbLatencies, 0.95).toFixed(0)} ms            ${percentile(tableDbLatencies, 0.95).toFixed(0)} ms`);
console.log(`End-to-end response p50        ${percentile(endToEndLatencies, 0.5).toFixed(0)} ms            ${percentile(tableEndToEnd, 0.5).toFixed(0)} ms`);
console.log(`End-to-end response p95        ${percentile(endToEndLatencies, 0.95).toFixed(0)} ms            ${percentile(tableEndToEnd, 0.95).toFixed(0)} ms`);
console.log(`Index throughput               ${(corpus.length / (indexMs / 1000)).toFixed(0)} docs/s        ${(corpus.length / (tableIndexMs / 1000)).toFixed(0)} docs/s`);
```

<Accordion title="Expected output">
  ```
  === Retrieval Accuracy (SciFact, 5,183 docs, 300 queries) ===

  Metric         Text Store    Tables (vector_search)
  NDCG@10        72.4          71.8
  Recall@10      83.1          82.6
  MRR@10         68.9          68.3

  === Latency Breakdown ===

  Stage                          Text Store       Tables
  Embedding inference (per q)    12 ms            12 ms
  DB search p50 (server-side)    8 ms             6 ms
  DB search p95 (server-side)    14 ms            11 ms
  End-to-end response p50        45 ms            38 ms
  End-to-end response p95        82 ms            71 ms
  Index throughput               182 docs/s       166 docs/s
  ```

  * **Embedding inference** is the same for both backends — it runs the same model.
  * **DB search** is the pure database time reported by the server (`search_time_ms` / `searchTimeMs`). This is where Text Store and Tables differ — Text Store runs hybrid (vector + FTS), Tables runs pure vector search.
  * **End-to-end response** includes network round-trip, gateway overhead, and database search. The difference between end-to-end and DB search is the overhead.
</Accordion>

## Step 6 — Cleanup

```ts theme={null}
await fetch(`${BASE_URL}/api/v1/search/text-store/stores/${TEXT_STORE_NAME}`, {
  method: 'DELETE',
  headers: HEADERS,
});
await sdk.search.tables.deleteTable({ tableName: TABLE_NAME });
console.log('Benchmark stores cleaned up.');
```

## Interpreting the results

### Accuracy context

The BEIR leaderboard reports NDCG\@10 on SciFact for reference:

| Model                   | NDCG\@10 |
| ----------------------- | -------- |
| BM25 (lexical baseline) | \~66.5   |
| text-embedding-ada-002  | \~70–73  |
| text-embedding-3-large  | \~74–76  |
| State-of-the-art (2024) | \~78–80  |

NDCG\@10 scores in the 70–76 range indicate strong retrieval quality, competitive with leading embedding models.

### What to look for

* **NDCG\@10** is the primary metric. It penalizes relevant documents that appear at lower ranks.
* **Recall\@10** measures how many relevant documents appear in the top 10 at all — important for RAG pipelines where downstream generation depends on retrieval completeness.
* **MRR\@10** measures how quickly the first relevant result appears — important for user-facing search.
* **Text Store vs Tables**: Text Store adds hybrid search (vector + keyword) automatically. Tables gives you explicit control over index type, distance metric, and filters.

### Reading the latency breakdown

A retrieval query has three latency components:

1. **Embedding inference** — the time to convert the query text into a vector. This is model inference and is the same regardless of which storage backend you use.
2. **Database search** — the time the search engine spends finding nearest neighbors. Reported by the server in `search_time_ms` (Text Store) or `searchTimeMs` (Tables). This is pure vector/hybrid search time with no network overhead.
3. **End-to-end response** — what the client observes: network round-trip + gateway routing + database search.

If end-to-end latency is high but database search is fast, the bottleneck is network or gateway overhead.
If database search is high, consider a different index type (e.g., IVF\_HNSW\_SQ for faster approximate search).

### Storage technology comparison

|                 | Text Store                                          | Tables                                                  |
| --------------- | --------------------------------------------------- | ------------------------------------------------------- |
| **Setup**       | Zero config — just name + dimension                 | Explicit schema, field types, indices                   |
| **Search**      | Hybrid (vector + full-text) automatic               | Compose operations: vector\_search, FTS, filter         |
| **Best for**    | Quick RAG prototyping, hybrid search out of the box | Custom schemas, filtered search, multi-index strategies |
| **Index types** | Automatic                                           | IVF\_FLAT, IVF\_PQ, IVF\_HNSW\_PQ, IVF\_HNSW\_SQ        |

## See also

* [Index and search text](/docs/search) for the full Text Store and Tables API reference.
* [Files and vector stores](/docs/files-and-vector-stores) for the OpenAI-compatible vector store API.
* [GraphRAG evaluation](/docs/graphrag) for multi-hop retrieval benchmarking.
