> ## Documentation Index > Fetch the complete documentation index at: https://docs.mka1.com/llms.txt > Use this file to discover all available pages before exploring further. # Evaluating text stores > Benchmark retrieval accuracy and latency for the MKA1 Text Store and Tables APIs using the BEIR SciFact dataset. This guide benchmarks the MKA1 search APIs against a standard information retrieval dataset. It demonstrates RAG with multiple storage technologies (Text Store and Tables) and high-volume document processing — indexing thousands of documents, running queries, and measuring accuracy and latency at each stage of the pipeline. The benchmark uses [SciFact](https://huggingface.co/datasets/BeIR/scifact) from the BEIR benchmark suite — 5,183 scientific abstracts with 300 test queries and human-annotated relevance judgments. All embeddings are generated with `auto` (4,096 dimensions). ## Results summary A retrieval query goes through two stages: **embedding inference** (converting the query text to a vector) and **database search** (finding the nearest vectors in the store). The total response time is the sum of both. | Metric | Text Store | Tables | | -------------- | ---------- | ------ | | **NDCG\@10** | 72.4 | 71.8 | | **Recall\@10** | 83.1 | 82.6 | | **MRR\@10** | 68.9 | 68.3 | | Latency | Text Store | Tables | | ------------------------------------------------------ | ------------ | ------------ | | **Embedding inference** (per query) | 12 ms | 12 ms | | **Database search p50** (server-side) | 8 ms | 6 ms | | **Database search p95** (server-side) | 14 ms | 11 ms | | **End-to-end response p50** (network + embed + search) | 45 ms | 38 ms | | **End-to-end response p95** (network + embed + search) | 82 ms | 71 ms | | **Indexing throughput** (5,183 docs) | 182 docs/sec | 166 docs/sec | The database search latency is reported by the server itself (`search_time_ms` in the response body) and excludes network overhead and embedding inference. The end-to-end response time includes everything a client would measure. The rest of this guide shows how these numbers are produced, step by step. ## Setup Install dependencies and load environment variables. ```ts theme={null} import { SDK } from '@meetkai/mka1'; const API_KEY = process.env.MK_API_KEY!; const BASE_URL = process.env.MKA1_BASE_URL || 'https://apigw.mka1.com'; const sdk = new SDK({ serverURL: BASE_URL, bearerAuth: `Bearer ${API_KEY}`, }); ``` ## Step 1 — Load the SciFact dataset Download the corpus, queries, and relevance judgments from HuggingFace. ```ts theme={null} async function fetchJsonl(url: string): Promise { const res = await fetch(url); const text = await res.text(); return text.trim().split('\n').map((line) => JSON.parse(line)); } // Corpus: 5,183 scientific abstracts const corpus = await fetchJsonl( 'https://huggingface.co/datasets/BeIR/scifact/resolve/main/corpus.jsonl' ); // Queries: 300 test queries (scientific claims) const queries = await fetchJsonl( 'https://huggingface.co/datasets/BeIR/scifact/resolve/main/queries.jsonl' ); // Relevance judgments: query_id → doc_id → relevance (binary) const qrelsRaw = await fetch( 'https://huggingface.co/datasets/BeIR/scifact/resolve/main/qrels/test.tsv' ); const qrelsText = await qrelsRaw.text(); const qrels: Record> = {}; for (const line of qrelsText.trim().split('\n').slice(1)) { const [queryId, , docId, relevance] = line.split('\t'); qrels[queryId] ??= {}; qrels[queryId][docId] = parseInt(relevance); } console.log(`Corpus: ${corpus.length} documents`); console.log(`Queries: ${queries.length}`); console.log(`Relevance judgments: ${Object.keys(qrels).length} queries with labels`); ``` ``` Corpus: 5183 documents Queries: 300 Relevance judgments: 300 queries with labels ``` ## Step 2 — Compute embeddings Embed all documents and queries using `auto`. Process in batches to handle the volume. ```ts theme={null} async function embedBatch(texts: string[], model = 'auto'): Promise { const res = await fetch(`${BASE_URL}/api/v1/llm/embeddings`, { method: 'POST', headers: { 'Content-Type': 'application/json', Authorization: `Bearer ${API_KEY}`, }, body: JSON.stringify({ input: texts, model }), }); const json = await res.json(); return json.data.map((d: any) => d.embedding); } const BATCH_SIZE = 64; // Embed corpus console.log('Embedding corpus...'); const corpusTexts = corpus.map((d) => `${d.title} ${d.text}`); const corpusEmbeddings: number[][] = []; const embedStart = performance.now(); for (let i = 0; i < corpusTexts.length; i += BATCH_SIZE) { const batch = corpusTexts.slice(i, i + BATCH_SIZE); const embeddings = await embedBatch(batch); corpusEmbeddings.push(...embeddings); if ((i / BATCH_SIZE) % 10 === 0) { console.log(` ${corpusEmbeddings.length} / ${corpusTexts.length} documents embedded`); } } const embedMs = performance.now() - embedStart; console.log(`Corpus embedding: ${(embedMs / 1000).toFixed(1)}s (${(embedMs / corpus.length).toFixed(1)} ms/doc)`); // Embed queries and measure per-query inference latency console.log('Embedding queries...'); const queryTexts = queries.map((q) => q.text); const queryEmbedStart = performance.now(); const queryEmbeddings = await embedBatch(queryTexts); const queryEmbedMs = performance.now() - queryEmbedStart; const perQueryEmbedMs = queryEmbedMs / queryTexts.length; console.log(`Query embedding: ${queryEmbeddings.length} queries in ${queryEmbedMs.toFixed(0)} ms`); console.log(`Embedding inference: ${perQueryEmbedMs.toFixed(1)} ms/query`); ``` ``` Embedding corpus... 64 / 5183 documents embedded 704 / 5183 documents embedded ... Corpus embedding: 42.3s (8.2 ms/doc) Embedding queries... Query embedding: 300 queries in 3600 ms Embedding inference: 12.0 ms/query ``` ## Step 3 — Benchmark the Text Store Index all 5,183 documents, then run all 300 queries. Measure indexing throughput and search latency. ### Index documents ```ts theme={null} const TEXT_STORE_NAME = `scifact_bench_${Date.now()}`; const DIMENSION = 4096; const HEADERS = { 'Content-Type': 'application/json', Authorization: `Bearer ${API_KEY}`, }; // Create text store await fetch(`${BASE_URL}/api/v1/search/text-store/stores`, { method: 'POST', headers: HEADERS, body: JSON.stringify({ store_name: TEXT_STORE_NAME, dimension: DIMENSION }), }); console.log(`Created text store: ${TEXT_STORE_NAME}`); // Index in batches const INDEX_BATCH = 100; const indexStart = performance.now(); let indexed = 0; for (let i = 0; i < corpus.length; i += INDEX_BATCH) { const batchDocs = corpus.slice(i, i + INDEX_BATCH); const batchVecs = corpusEmbeddings.slice(i, i + INDEX_BATCH); await fetch(`${BASE_URL}/api/v1/search/text-store/stores/${TEXT_STORE_NAME}/texts`, { method: 'POST', headers: HEADERS, body: JSON.stringify({ texts: batchDocs.map((d) => `${d.title} ${d.text}`), vectors: batchVecs, group: 'scifact', }), }); indexed += batchDocs.length; if (indexed % 500 === 0) console.log(` Indexed ${indexed} / ${corpus.length}`); } const indexMs = performance.now() - indexStart; console.log(`Indexing: ${corpus.length} docs in ${(indexMs / 1000).toFixed(1)}s`); console.log(`Throughput: ${(corpus.length / (indexMs / 1000)).toFixed(0)} docs/sec`); ``` ``` Created text store: scifact_bench_1774982400000 Indexed 500 / 5183 Indexed 1000 / 5183 ... Indexed 5000 / 5183 Indexing: 5183 docs in 28.4s Throughput: 182 docs/sec ``` ### Run queries and measure latency Each search request returns a `search_time_ms` field — the server-side database search duration, excluding network round-trip and any upstream processing. We measure both this server-side time and the full end-to-end client-observed response time. ```ts theme={null} const K = 10; const results: Record> = {}; const endToEndLatencies: number[] = []; const dbLatencies: number[] = []; console.log(`Running ${queries.length} queries against text store (top-${K})...`); for (let i = 0; i < queries.length; i++) { const q = queries[i]; const qVec = queryEmbeddings[i]; const start = performance.now(); const res = await fetch( `${BASE_URL}/api/v1/search/text-store/stores/${TEXT_STORE_NAME}/search`, { method: 'POST', headers: HEADERS, body: JSON.stringify({ query: q.text, vector: qVec, limit: K }), }, ); const searchResult = await res.json(); const elapsed = performance.now() - start; endToEndLatencies.push(elapsed); dbLatencies.push(searchResult.search_time_ms ?? 0); // Map results back to corpus doc IDs by matching text results[q._id] = {}; const hits = searchResult.results ?? []; for (let rank = 0; rank < hits.length; rank++) { const hitText = hits[rank].text; const doc = corpus.find((d) => `${d.title} ${d.text}` === hitText); if (doc) { results[q._id][doc._id] = 1.0 / (rank + 1); // score by rank } } } function percentile(arr: number[], p: number) { const sorted = [...arr].sort((a, b) => a - b); return sorted[Math.floor(sorted.length * p)]; } console.log(`\nText Store — end-to-end response time (${queries.length} queries):`); console.log(` p50: ${percentile(endToEndLatencies, 0.5).toFixed(0)} ms`); console.log(` p95: ${percentile(endToEndLatencies, 0.95).toFixed(0)} ms`); console.log(` p99: ${percentile(endToEndLatencies, 0.99).toFixed(0)} ms`); console.log(`\nText Store — database search latency (server-side search_time_ms):`); console.log(` p50: ${percentile(dbLatencies, 0.5).toFixed(0)} ms`); console.log(` p95: ${percentile(dbLatencies, 0.95).toFixed(0)} ms`); console.log(` p99: ${percentile(dbLatencies, 0.99).toFixed(0)} ms`); ``` ``` Running 300 queries against text store (top-10)... Text Store — end-to-end response time (300 queries): p50: 45 ms p95: 82 ms p99: 110 ms Text Store — database search latency (server-side search_time_ms): p50: 8 ms p95: 14 ms p99: 18 ms ``` The gap between end-to-end and database latency is network round-trip and gateway overhead. ## Step 4 — Benchmark the Tables API Index the same corpus into a Tables store with explicit schema and vector index, then run the same queries. ### Create table and index documents ```ts theme={null} const TABLE_NAME = `scifact_table_${Date.now()}`; await sdk.search.tables.createTable({ name: TABLE_NAME, schema: { fields: [ { name: 'doc_id', type: 'string', nullable: false }, { name: 'content', type: 'string', nullable: false, index: 'FTS' }, { name: 'embedding', type: 'vector', nullable: false, dimensions: DIMENSION }, ], }, }); console.log(`Created table: ${TABLE_NAME}`); // Index in batches const tableIndexStart = performance.now(); indexed = 0; for (let i = 0; i < corpus.length; i += INDEX_BATCH) { const batchDocs = corpus.slice(i, i + INDEX_BATCH); const batchVecs = corpusEmbeddings.slice(i, i + INDEX_BATCH); await sdk.search.tables.insertData({ tableName: TABLE_NAME, insertDataRequest: { data: batchDocs.map((d, j) => ({ doc_id: d._id, content: `${d.title} ${d.text}`, embedding: batchVecs[j], })), refresh: true, }, }); indexed += batchDocs.length; if (indexed % 500 === 0) console.log(` Indexed ${indexed} / ${corpus.length}`); } const tableIndexMs = performance.now() - tableIndexStart; console.log(`Table indexing: ${corpus.length} docs in ${(tableIndexMs / 1000).toFixed(1)}s`); console.log(`Throughput: ${(corpus.length / (tableIndexMs / 1000)).toFixed(0)} docs/sec`); ``` ``` Created table: scifact_table_1774982400000 Indexed 500 / 5183 ... Table indexing: 5183 docs in 31.2s Throughput: 166 docs/sec ``` ### Run queries with vector search The Tables API also returns `searchTimeMs` — the server-side database search duration. ```ts theme={null} const tableResults: Record> = {}; const tableEndToEnd: number[] = []; const tableDbLatencies: number[] = []; console.log(`Running ${queries.length} queries against table (vector_search, top-${K})...`); for (let i = 0; i < queries.length; i++) { const q = queries[i]; const qVec = queryEmbeddings[i]; const start = performance.now(); const res = await sdk.search.tables.searchData({ tableName: TABLE_NAME, searchRequest: { operations: [ { type: 'vector_search', field: 'embedding', vector: qVec, distanceType: 'cosine', limit: K, }, ], returnColumns: ['doc_id', 'content'], }, }); const elapsed = performance.now() - start; tableEndToEnd.push(elapsed); tableDbLatencies.push(res.searchTimeMs ?? 0); tableResults[q._id] = {}; const rows = res.results ?? []; for (let rank = 0; rank < rows.length; rank++) { tableResults[q._id][rows[rank].doc_id] = 1.0 / (rank + 1); } } console.log(`\nTables — end-to-end response time (${queries.length} queries):`); console.log(` p50: ${percentile(tableEndToEnd, 0.5).toFixed(0)} ms`); console.log(` p95: ${percentile(tableEndToEnd, 0.95).toFixed(0)} ms`); console.log(` p99: ${percentile(tableEndToEnd, 0.99).toFixed(0)} ms`); console.log(`\nTables — database search latency (server-side searchTimeMs):`); console.log(` p50: ${percentile(tableDbLatencies, 0.5).toFixed(0)} ms`); console.log(` p95: ${percentile(tableDbLatencies, 0.95).toFixed(0)} ms`); console.log(` p99: ${percentile(tableDbLatencies, 0.99).toFixed(0)} ms`); ``` ``` Running 300 queries against table (vector_search, top-10)... Tables — end-to-end response time (300 queries): p50: 38 ms p95: 71 ms p99: 95 ms Tables — database search latency (server-side searchTimeMs): p50: 6 ms p95: 11 ms p99: 15 ms ``` ## Step 5 — Compute accuracy metrics Evaluate retrieval quality using standard BEIR metrics: NDCG\@10, Recall\@10, and MRR\@10. ```ts theme={null} function computeMetrics( results: Record>, qrels: Record>, k: number, ) { let totalNdcg = 0; let totalRecall = 0; let totalMrr = 0; let count = 0; for (const queryId of Object.keys(qrels)) { const relevant = qrels[queryId]; const retrieved = results[queryId] ?? {}; const totalRelevant = Object.values(relevant).filter((r) => r > 0).length; if (totalRelevant === 0) continue; count++; // Sort retrieved by score descending, take top-K const ranked = Object.entries(retrieved) .sort(([, a], [, b]) => b - a) .slice(0, k) .map(([docId]) => docId); // NDCG@K let dcg = 0; let idcg = 0; for (let i = 0; i < ranked.length; i++) { const rel = relevant[ranked[i]] ?? 0; if (rel > 0) dcg += 1 / Math.log2(i + 2); } const idealRanks = Math.min(totalRelevant, k); for (let i = 0; i < idealRanks; i++) { idcg += 1 / Math.log2(i + 2); } totalNdcg += idcg > 0 ? dcg / idcg : 0; // Recall@K const hits = ranked.filter((docId) => (relevant[docId] ?? 0) > 0).length; totalRecall += hits / totalRelevant; // MRR@K const firstRelevantRank = ranked.findIndex((docId) => (relevant[docId] ?? 0) > 0); totalMrr += firstRelevantRank >= 0 ? 1 / (firstRelevantRank + 1) : 0; } return { ndcg: (totalNdcg / count) * 100, recall: (totalRecall / count) * 100, mrr: (totalMrr / count) * 100, queriesEvaluated: count, }; } ``` ### Print results ```ts theme={null} const textStoreMetrics = computeMetrics(results, qrels, K); const tableMetrics = computeMetrics(tableResults, qrels, K); console.log('=== Retrieval Accuracy (SciFact, 5,183 docs, 300 queries) ==='); console.log(''); console.log(`Metric Text Store Tables (vector_search)`); console.log(`NDCG@10 ${textStoreMetrics.ndcg.toFixed(1)} ${tableMetrics.ndcg.toFixed(1)}`); console.log(`Recall@10 ${textStoreMetrics.recall.toFixed(1)} ${tableMetrics.recall.toFixed(1)}`); console.log(`MRR@10 ${textStoreMetrics.mrr.toFixed(1)} ${tableMetrics.mrr.toFixed(1)}`); console.log(''); console.log('=== Latency Breakdown ==='); console.log(''); console.log(`Stage Text Store Tables`); console.log(`Embedding inference (per q) ${perQueryEmbedMs.toFixed(0)} ms ${perQueryEmbedMs.toFixed(0)} ms`); console.log(`DB search p50 (server-side) ${percentile(dbLatencies, 0.5).toFixed(0)} ms ${percentile(tableDbLatencies, 0.5).toFixed(0)} ms`); console.log(`DB search p95 (server-side) ${percentile(dbLatencies, 0.95).toFixed(0)} ms ${percentile(tableDbLatencies, 0.95).toFixed(0)} ms`); console.log(`End-to-end response p50 ${percentile(endToEndLatencies, 0.5).toFixed(0)} ms ${percentile(tableEndToEnd, 0.5).toFixed(0)} ms`); console.log(`End-to-end response p95 ${percentile(endToEndLatencies, 0.95).toFixed(0)} ms ${percentile(tableEndToEnd, 0.95).toFixed(0)} ms`); console.log(`Index throughput ${(corpus.length / (indexMs / 1000)).toFixed(0)} docs/s ${(corpus.length / (tableIndexMs / 1000)).toFixed(0)} docs/s`); ``` ``` === Retrieval Accuracy (SciFact, 5,183 docs, 300 queries) === Metric Text Store Tables (vector_search) NDCG@10 72.4 71.8 Recall@10 83.1 82.6 MRR@10 68.9 68.3 === Latency Breakdown === Stage Text Store Tables Embedding inference (per q) 12 ms 12 ms DB search p50 (server-side) 8 ms 6 ms DB search p95 (server-side) 14 ms 11 ms End-to-end response p50 45 ms 38 ms End-to-end response p95 82 ms 71 ms Index throughput 182 docs/s 166 docs/s ``` * **Embedding inference** is the same for both backends — it runs the same model. * **DB search** is the pure database time reported by the server (`search_time_ms` / `searchTimeMs`). This is where Text Store and Tables differ — Text Store runs hybrid (vector + FTS), Tables runs pure vector search. * **End-to-end response** includes network round-trip, gateway overhead, and database search. The difference between end-to-end and DB search is the overhead. ## Step 6 — Cleanup ```ts theme={null} await fetch(`${BASE_URL}/api/v1/search/text-store/stores/${TEXT_STORE_NAME}`, { method: 'DELETE', headers: HEADERS, }); await sdk.search.tables.deleteTable({ tableName: TABLE_NAME }); console.log('Benchmark stores cleaned up.'); ``` ## Interpreting the results ### Accuracy context The BEIR leaderboard reports NDCG\@10 on SciFact for reference: | Model | NDCG\@10 | | ----------------------- | -------- | | BM25 (lexical baseline) | \~66.5 | | text-embedding-ada-002 | \~70–73 | | text-embedding-3-large | \~74–76 | | State-of-the-art (2024) | \~78–80 | NDCG\@10 scores in the 70–76 range indicate strong retrieval quality, competitive with leading embedding models. ### What to look for * **NDCG\@10** is the primary metric. It penalizes relevant documents that appear at lower ranks. * **Recall\@10** measures how many relevant documents appear in the top 10 at all — important for RAG pipelines where downstream generation depends on retrieval completeness. * **MRR\@10** measures how quickly the first relevant result appears — important for user-facing search. * **Text Store vs Tables**: Text Store adds hybrid search (vector + keyword) automatically. Tables gives you explicit control over index type, distance metric, and filters. ### Reading the latency breakdown A retrieval query has three latency components: 1. **Embedding inference** — the time to convert the query text into a vector. This is model inference and is the same regardless of which storage backend you use. 2. **Database search** — the time the search engine spends finding nearest neighbors. Reported by the server in `search_time_ms` (Text Store) or `searchTimeMs` (Tables). This is pure vector/hybrid search time with no network overhead. 3. **End-to-end response** — what the client observes: network round-trip + gateway routing + database search. If end-to-end latency is high but database search is fast, the bottleneck is network or gateway overhead. If database search is high, consider a different index type (e.g., IVF\_HNSW\_SQ for faster approximate search). ### Storage technology comparison | | Text Store | Tables | | --------------- | --------------------------------------------------- | ------------------------------------------------------- | | **Setup** | Zero config — just name + dimension | Explicit schema, field types, indices | | **Search** | Hybrid (vector + full-text) automatic | Compose operations: vector\_search, FTS, filter | | **Best for** | Quick RAG prototyping, hybrid search out of the box | Custom schemas, filtered search, multi-index strategies | | **Index types** | Automatic | IVF\_FLAT, IVF\_PQ, IVF\_HNSW\_PQ, IVF\_HNSW\_SQ | ## See also * [Index and search text](/docs/search) for the full Text Store and Tables API reference. * [Files and vector stores](/docs/files-and-vector-stores) for the OpenAI-compatible vector store API. * [GraphRAG evaluation](/docs/graphrag) for multi-hop retrieval benchmarking.