Vespa vs Qdrant vs Turbopuffer for large-scale hybrid search (BM25 + text & image vectors)
Vespa seems like a good choice.
* * *
### My choice for your case: default to **Vespa** , with two “escape hatches” (Qdrant hybrid stack, or Turbopuffer retrieval service)
Your workload is “product search at scale,” not “vector search.” At **170M products** , with **text + image embeddings** , and a **hard sub-100 ms SLA** , the platform that wins is the one that lets you:
1. **retrieve candidates from multiple signals cheaply** ,
2. **fuse and normalize those signals predictably** ,
3. **apply business logic and reranking without extra network hops** ,
4. **keep p99 stable under filters, skew, and failover**.
That set of constraints points most directly to **Vespa** as the default.
* * *
## Why I default to Vespa
### 1) You need a real multi-stage ranking pipeline inside the engine
Hybrid search at your scale is almost always:
* Stage A: lexical + ANN (text) + ANN (image) candidate generation
* Stage B: fusion and score normalization
* Stage C: rerank top K with business logic and optionally a stronger model
Vespa bakes this into the engine as “phased ranking” and “global-phase ranking.” Global-phase is explicitly positioned for expensive reranking (often cross-encoders) and runs on the globally best hits after distributed retrieval. (Vespa Document)
That matters because your SLA is end-to-end. Every extra service hop (Solr + vector DB + fusion service + reranker) increases p95 and p99 variance.
### 2) “BM25” means BM25, not “token filter”
If “BM25 or equivalent” in your requirements really means “search-engine grade lexical relevance,” Vespa supports BM25 as a rank feature over indexed string fields and explicitly frames it as cheap enough for first-phase ranking. (Vespa Document)
This is a key separation from vector-first databases where “text search” can be closer to filtering than full lexical ranking.
### 3) Multimodal (text + image vectors) is a first-class modeling problem
You want at least two vector spaces per product (text embedding and image embedding), often more if you store multiple images or multiple embedding models. Vespa’s model encourages putting this into the schema and ranking expressions, then blending in the ranking pipeline (early for recall, late for precision). The reason this matters is not “features,” it’s **operational simplicity** : one query plan, one result list, one ranking definition.
### 4) Distributed operation and reshaping the cluster is built around “buckets,” not manual sharding
At 170M, you will resize clusters, rebalance, and handle node loss. Vespa’s content model manages documents in “buckets” and is explicit that you do not manually control sharding. (Vespa Document)
That does not eliminate ops work, but it reduces the number of “hand-crafted shard topology” decisions you must get right forever.
### 5) You can keep the hard stuff (reranking) in the same serving tier
If you plan to use cross-encoders or ONNX models for reranking, Vespa supports ONNX model usage in ranking. The phased ranking docs explicitly describe global-phase as optimized for inference use cases. (Vespa Document)
That is exactly what keeps “BM25 + embeddings + business logic + rerank” inside one request path.
### 6) “Open” deployment is clean
Vespa is Apache-2.0 licensed, self-hostable, and also has a managed offering. (GitHub)
* * *
## When I would _not_ choose Vespa first
### A) If your team wants a simpler engine and you are OK with a pipeline
If your organization is comfortable running multiple services and you want to keep “classic lexical search” in Solr (at least initially), then Qdrant is a strong candidate for the vector side and hybrid fusion, with your own orchestrator.
### B) If cost dominates and you accept “retrieval service + app-layer ranking”
If your primary constraint is cost and ops simplicity, and you accept doing multiple retrieval calls and fusing/reranking in your application, then Turbopuffer can be compelling. But you must accept its architectural and product constraints (below).
* * *
## My “escape hatch #1”: choose **Qdrant** if you accept either sparse-lexical or a dual-engine design
### Why Qdrant can be the right choice
1. **Hybrid and multi-stage queries are explicit in the Query API** (prefetch, fusion, rerank). (Qdrant)
2. **Multiple named vectors per point** are documented and directly support text + image embeddings in one record. (Qdrant)
3. Qdrant provides practical guidance for sizing and performance tuning (RAM vs disk, replication, quantization). (Qdrant)
### The reason I still do not default to Qdrant for your spec
Your requirements say “BM25 or equivalent.” Qdrant’s full-text index is documented as enabling you to **filter points by presence of a word or phrase** in a payload field. (Qdrant)
That is not the same as “Solr-grade lexical ranking,” with all the usual analyzers, proximity scoring behavior, and relevance tuning workflow.
So Qdrant is best when:
* you are comfortable with **sparse vectors** for lexical-like retrieval and then fuse with dense (and maybe rerank), or (Qdrant)
* you keep Solr/ES as the lexical system and treat Qdrant as the vector retrieval system.
In that world, Qdrant can win on engineering speed and modularity, but you accept higher tail-latency risk from multi-service orchestration.
* * *
## My “escape hatch #2”: choose **Turbopuffer** if cost and ops dominate, and you can live with its hybrid model
### Why Turbopuffer can be the right choice
1. It is built around object storage with caching, and explicitly targets “search queries need to finish in <100ms,” while acknowledging occasional cold queries in the hundreds of ms. (turbopuffer)
2. Its docs encourage **multi-query hybrid** and **client-side fusion (e.g., RRF)** , and explicitly warn that several turbopuffer queries per user query is common. (turbopuffer)
3. It supports BYOC deployment into your Kubernetes cluster with a vendor-operated control plane model. (turbopuffer)
### The reason I do not choose Turbopuffer first for your spec
**Multimodal vectors.** Today, their write docs define vectors as attributes with name `vector`, and each vector in a namespace must share dimensionality. (turbopuffer)
Their roadmap lists “multiple vector columns” as a future item. (turbopuffer)
For your case (text vector + image vector), that means you must either:
* use separate namespaces or separate indexing patterns and fuse client-side, or
* compress into one “unified” embedding approach, which often reduces controllability, or
* wait until multi-vector columns exist and are proven in production.
Also, Turbopuffer is commercial-only today. Their own tradeoffs doc states they do not offer an open source version and they encourage you to do second-stage reranking in your own application code. (turbopuffer)
So Turbopuffer can win if:
* you want a retrieval substrate that is cheap to store huge corpora on,
* your tenancy model maps cleanly to namespaces,
* you can tolerate occasional cold tails or you can pre-warm effectively,
* you accept app-layer fusion and reranking.
* * *
## The actual decision checklist I would use for your evaluation
### 1) How “Solr-like” must lexical relevance be?
* If you need true Solr-grade lexical behavior and tuning: favor **Vespa** (replace) or **Solr + Qdrant** (dual engine).
* If sparse lexical is acceptable: **Qdrant** becomes more plausible as a single engine.
(Load-bearing Qdrant detail: full-text index is described as filtering by token/phrase presence. (Qdrant))
### 2) Do you need text + image vectors _today_ , inside one index?
* Vespa: yes, naturally via multiple tensor fields and ranking blends. (Vespa Document)
* Qdrant: yes, via named vectors per point. (Qdrant)
* Turbopuffer: roadmap says multiple vector columns are upcoming; current docs describe a single `vector` attribute. (turbopuffer)
This single item alone often eliminates Turbopuffer for multimodal product search right now.
### 3) Where must business logic and reranking live to hit p99?
* If you need it inside the engine for tail control: **Vespa**. (Vespa Document)
* If you can accept external rerank: **Qdrant** or **Turbopuffer**.
### 4) What is your filter selectivity distribution?
B2B product search is filter-heavy. You must benchmark “ANN + filters” scenarios.
* Turbopuffer explicitly recommends partitioning into namespaces instead of filtering “where possible.” (turbopuffer)
* Qdrant emphasizes payload indexing and planning. (Qdrant)
* Vespa has a mature model for distributed serving and late-stage reranking that helps when filters shrink candidate sets. (Vespa Document)
### 5) What is your vector storage strategy?
At 170M, vectors dominate cost. You will likely need quantization, on-disk indexes, or multi-stage retrieval.
* Qdrant’s capacity planning and optimize guides make this a first-class decision. (Qdrant)
* Turbopuffer is explicitly built around object storage + cache. (turbopuffer)
* * *
## Bottom line recommendation
* **If your goal is one platform that can replace Solr and still do multimodal hybrid ranking under strict latency constraints, pick Vespa first.** You get BM25, composable ANN retrieval, and a built-in multi-stage ranking pipeline with ONNX-friendly reranking in one request path. (Vespa Document)
* **If you want a modular architecture and accept a hybrid pipeline (or sparse lexical), pick Qdrant.** It has named vectors and a strong multi-stage Query API, but its “full-text index” is documented as filtering by token presence, so validate whether it meets your lexical needs. (Qdrant)
* **If cost is the top constraint and you accept app-layer fusion plus current multimodal constraints, consider Turbopuffer.** But its own docs say no open source version, recommend app-layer reranking, and show multiple vector columns as roadmap. (turbopuffer)