meetkai:qwen3-embedding-8b (4,096 dimensions).
Results summary
A retrieval query goes through two stages: embedding inference (converting the query text to a vector) and database search (finding the nearest vectors in the store). The total response time is the sum of both.| Metric | Text Store | Tables |
|---|---|---|
| NDCG@10 | 72.4 | 71.8 |
| Recall@10 | 83.1 | 82.6 |
| MRR@10 | 68.9 | 68.3 |
| Latency | Text Store | Tables |
|---|---|---|
| Embedding inference (per query) | 12 ms | 12 ms |
| Database search p50 (server-side) | 8 ms | 6 ms |
| Database search p95 (server-side) | 14 ms | 11 ms |
| End-to-end response p50 (network + embed + search) | 45 ms | 38 ms |
| End-to-end response p95 (network + embed + search) | 82 ms | 71 ms |
| Indexing throughput (5,183 docs) | 182 docs/sec | 166 docs/sec |
search_time_ms in the response body) and excludes network overhead and embedding inference. The end-to-end response time includes everything a client would measure.
The rest of this guide shows how these numbers are produced, step by step.
Setup
Install dependencies and load environment variables.Step 1 — Load the SciFact dataset
Download the corpus, queries, and relevance judgments from HuggingFace.Expected output
Expected output
Step 2 — Compute embeddings
Embed all documents and queries usingmeetkai:qwen3-embedding-8b.
Process in batches to handle the volume.
Expected output
Expected output
Step 3 — Benchmark the Text Store
Index all 5,183 documents, then run all 300 queries. Measure indexing throughput and search latency.Index documents
Expected output
Expected output
Run queries and measure latency
Each search request returns asearch_time_ms field — the server-side database search duration, excluding network round-trip and any upstream processing.
We measure both this server-side time and the full end-to-end client-observed response time.
Expected output
Expected output
Step 4 — Benchmark the Tables API
Index the same corpus into a Tables store with explicit schema and vector index, then run the same queries.Create table and index documents
Expected output
Expected output
Run queries with vector search
The Tables API also returnssearchTimeMs — the server-side database search duration.
Expected output
Expected output
Step 5 — Compute accuracy metrics
Evaluate retrieval quality using standard BEIR metrics: NDCG@10, Recall@10, and MRR@10.Print results
Expected output
Expected output
- Embedding inference is the same for both backends — it runs the same model.
- DB search is the pure database time reported by the server (
search_time_ms/searchTimeMs). This is where Text Store and Tables differ — Text Store runs hybrid (vector + FTS), Tables runs pure vector search. - End-to-end response includes network round-trip, gateway overhead, and database search. The difference between end-to-end and DB search is the overhead.
Step 6 — Cleanup
Interpreting the results
Accuracy context
The BEIR leaderboard reports NDCG@10 on SciFact for reference:| Model | NDCG@10 |
|---|---|
| BM25 (lexical baseline) | ~66.5 |
| text-embedding-ada-002 | ~70–73 |
| text-embedding-3-large | ~74–76 |
| State-of-the-art (2024) | ~78–80 |
What to look for
- NDCG@10 is the primary metric. It penalizes relevant documents that appear at lower ranks.
- Recall@10 measures how many relevant documents appear in the top 10 at all — important for RAG pipelines where downstream generation depends on retrieval completeness.
- MRR@10 measures how quickly the first relevant result appears — important for user-facing search.
- Text Store vs Tables: Text Store adds hybrid search (vector + keyword) automatically. Tables gives you explicit control over index type, distance metric, and filters.
Reading the latency breakdown
A retrieval query has three latency components:- Embedding inference — the time to convert the query text into a vector. This is model inference and is the same regardless of which storage backend you use.
- Database search — the time the search engine spends finding nearest neighbors. Reported by the server in
search_time_ms(Text Store) orsearchTimeMs(Tables). This is pure vector/hybrid search time with no network overhead. - End-to-end response — what the client observes: network round-trip + gateway routing + database search.
Storage technology comparison
| Text Store | Tables | |
|---|---|---|
| Setup | Zero config — just name + dimension | Explicit schema, field types, indices |
| Search | Hybrid (vector + full-text) automatic | Compose operations: vector_search, FTS, filter |
| Best for | Quick RAG prototyping, hybrid search out of the box | Custom schemas, filtered search, multi-index strategies |
| Index types | Automatic | IVF_FLAT, IVF_PQ, IVF_HNSW_PQ, IVF_HNSW_SQ |
See also
- Index and search text for the full Text Store and Tables API reference.
- Files and vector stores for the OpenAI-compatible vector store API.
- GraphRAG evaluation for multi-hop retrieval benchmarking.