Retrieval Methods in RAG: Dense, Sparse and Hybrid

Part of the RAG series.

RAG retrieval can be dense (embedding-based), sparse (keyword-based) or hybrid (both). Each has trade-offs: dense retrieval captures meaning; sparse retrieval matches exact terms; hybrid combines them. You can also add a reranker to improve ordering. This post summarises these options and when to use them.

Dense Retrieval

Dense retrieval uses embeddings. You embed the query and each chunk, then return the chunks whose embeddings are closest to the query embedding (e.g. cosine similarity or dot product). It captures semantic similarity: "refund policy" can match "money-back guarantee" even without shared terms. This is the usual default in RAG. See Embeddings in RAG and Vector Databases for RAG.

Sparse Retrieval

Sparse retrieval uses term-based signals. The classic example is BM25: it scores documents by how often query terms appear and how rare they are (IDF). It does not use embeddings. It is strong when exact names, IDs or phrases matter (e.g. "Section 4.2", product codes). It is weak when the user rephrases or uses synonyms. Many search engines (Elasticsearch, OpenSearch, etc.) support BM25 or similar out of the box.

Hybrid search runs both dense and sparse retrieval and then fuses the results. Common fusion strategies:

  • Reciprocal rank fusion (RRF). Combine rankings by summing 1/(k + rank) for each hit; sort by total score. No need to normalize dense vs sparse scores.
  • Linear combination. score = α * dense_score + (1 − α) * sparse_score, then sort. You must normalize both scores to a common scale (e.g. 0–1).

Hybrid often improves recall and robustness when you have a mix of semantic and keyword-style queries (e.g. docs with precise names and also conceptual questions).

Reranking

A reranker takes the top-k results from your first-stage retrieval (dense, sparse or hybrid) and re-scores or re-orders them with a more accurate but slower model. Typically you retrieve 20–50 chunks, then rerank to get the top 5–10. Rerankers are often small cross-encoder or similar models that score (query, chunk) pairs. They significantly improve precision and context quality in many RAG setups.

# Conceptual pipeline: hybrid + rerank
dense_hits = vector_db.search(query_embedding, k=30)
sparse_hits = bm25.search(query, k=30)
fused = rrf(dense_hits, sparse_hits)[:20]
reranked = reranker.score(query, fused)[:5]
context = [c.text for c in reranked]

When to Use What

  • Dense only. Good default when queries are conversational or conceptual and you care about meaning.
  • Sparse only. When exact terms, IDs or citations matter and you have a strong keyword signal.
  • Hybrid. When you have both conceptual and precise queries, or when dense retrieval alone misses important keyword matches.
  • Reranking. When you can afford a second-stage model and want better precision; almost always helpful after hybrid or dense retrieval.

For measuring retrieval and overall RAG quality, see Evaluating RAG Systems.

Get in touch

Questions about RAG or AI knowledge systems? Tell us about your project.