Elasticsearch Vector Search Deep Dive: dense_vector, HNSW Tuning, and Hybrid Search in Practice

Elasticsearch natively supports vector retrieval and hybrid search. Its core value lies in embedding text, images, or behavioral data into high-dimensional vectors and using approximate nearest neighbor retrieval to overcome the weak semantics and unstable recall of keyword-only search. This article focuses on three keywords: dense_vector, HNSW, and hybrid search, and explains parameter behavior, query syntax, and a practical tuning path.

The technical specification snapshot

Parameter Description
Search Engine Elasticsearch 8.x
Core Languages JSON DSL, Bash, Python
Retrieval Protocol HTTP/REST
Core Capabilities dense_vector, kNN, Hybrid Search
Nearest Neighbor Algorithm HNSW
Similarity Metrics cosine, l2_norm, dot_product
Article Popularity Original article shows 168 views, 1 like, and 3 saves
Core Dependencies Elasticsearch vector indexing, embedding models, Explain API

Vector search turns semantic similarity into distance computation

Vector search encodes text, images, and user behavior into floating-point arrays, then uses cosine similarity or Euclidean distance to find the closest content. Compared with an inverted index, it does not depend on exact lexical matching, which makes it better suited for semantic retrieval, recommendation recall, and multimodal search.

In Elasticsearch, this capability is powered by the dense_vector field and the knn query. Its key advantage is not simply that it is “more accurate,” but that it can still maintain strong recall and low latency at scale by using approximate algorithms.

A typical use case is recommendation recall

In live streaming or e-commerce recommendation systems, each user and each item can be encoded as a vector. Given a query vector, the system quickly finds the most similar TopK candidates from a massive candidate pool. That is the core task of approximate nearest neighbor search.

{
  "frame_vector": {
    "type": "dense_vector",
    "dims": 512,
    "index": true,
    "similarity": "cosine"
  }
}

This mapping defines a 512-dimensional vector field and enables approximate indexing based on cosine similarity.

Vector field design determines whether the index is production-ready

dims is a fixed dimension and cannot be modified in place after index creation. If your embedding model changes from 384 dimensions to 768 dimensions, you must rebuild the index. Otherwise, Elasticsearch will raise a dimension mismatch error immediately.

index determines whether Elasticsearch builds an approximate nearest neighbor index. If it is false, Elasticsearch can only fall back to brute-force computation. Once your dataset reaches the million-scale range, query latency can quickly become unmanageable.

HNSW parameters directly affect recall and cost

HNSW is a graph-based approximate retrieval algorithm. Its core idea is to reduce the search space with a multi-layer adjacency graph. Elasticsearch exposes two critical parameters: m and ef_construction.

{
  "frame_vector": {
    "type": "dense_vector",
    "dims": 512,
    "index": true,
    "similarity": "cosine",
    "index_options": {
      "type": "hnsw",
      "m": 16,
      "ef_construction": 100
    }
  }
}

This configuration builds the HNSW graph during indexing. A larger m creates a denser graph, and a larger ef_construction improves graph quality.

k and num_candidates control result count and recall depth separately

k is the number of final results returned to the application, while num_candidates is the candidate set size collected per shard before final ranking. Many production misjudgments come from tuning only k but not num_candidates, which leads to a situation where “the number of results looks correct, but recall quality is unstable.”

In practice, num_candidates defines the search breadth at query time. The larger it is, the closer the result gets to exact search. The tradeoff is higher CPU usage, more memory access, and worse P99 latency.

In practice, build the graph conservatively first, then increase query breadth gradually

For small datasets, you can use brute-force search directly or set num_candidates close to the full corpus size. At large scale, teams typically start with 500 in load testing, then gradually increase it to 1000, 3000, or even 10000 based on recall targets and SLA requirements.

Data Scale k Recommended num_candidates Notes
Thousands 10 Full set or close to full set Can be treated approximately as exact search
Millions 10 100-1000 Balance latency and recall first
Tens of millions and above 10 500-10000 Work backward from benchmarking results to find the optimum

ef_construction and ef_search should be optimized separately

ef_construction applies during graph construction and determines index graph quality. ef_search applies during query execution and is typically reflected as num_candidates in Elasticsearch. They are not substitutes for each other. One represents one-time indexing cost, and the other represents online query cost.

A more practical approach is to keep ef_construction relatively high during indexing to secure a stronger recall ceiling, then control num_candidates online according to your latency budget.

{
  "index_options": {
    "type": "hnsw",
    "m": 24,
    "ef_construction": 300
  }
}

The goal of this configuration is to improve graph quality and leave more room for query-time recall.

Hybrid search should use query and knn as parallel top-level clauses

In Elasticsearch 8.x, query and knn must appear at the same top level. You cannot place knn inside bool.should. This is one of the most common syntax pitfalls when teams first adopt vector search.

When both clauses are parallel, you can generally interpret the final score as the sum of the text score and the vector score. That means even if a document does not match the text query, it can still enter the final result set if its vector similarity is high enough.

{
  "query": {
    "multi_match": {
      "query": "手机",
      "fields": ["title^2"]
    }
  },
  "knn": {
    "field": "frame_vector",
    "query_vector": [0.1, -0.2, 0.3],
    "k": 10,
    "num_candidates": 500
  },
  "explain": true
}

This query runs full-text search and vector recall at the same time, and uses explain to inspect how the final score is composed.

Score explanations let you validate hybrid ranking logic directly

If _explanation contains sum of:, it usually means the text score and kNN score have been added together. If one channel does not match, its detail may not appear, but that does not mean the channel was excluded from ranking.

{
  "description": "sum of:",
  "details": [
    {"description": "score from multi_match query", "value": 2.03},
    {"description": "within top k documents", "value": 5.02}
  ]
}

This explanation shows that the target document matched both the text and vector channels, and both scores contributed to the final rank.

A dish search example shows why pure vector hits can appear

When the query term is “水煮鱼,” a document may still enter TopK even if its text does not contain that term at all, as long as its vector is very close to the query vector. In the parallel query model, a text score of 0 does not mean the document is filtered out.

This means hybrid search behaves more like “multi-channel recall plus score fusion” by default, rather than “text must match before vector ranking begins.” If your business logic requires strict filtering, you must explicitly use knn.filter.

curl -X GET "http://127.0.0.1:9200/live_room/_search" \
  -H 'Content-Type: application/json' \
  -d '{
    "query": {
      "multi_match": {
        "query": "水煮鱼",
        "fields": ["anchor_name^2", "room_title"]
      }
    },
    "knn": {
      "field": "frame_vector",
      "query_vector": [0.12, 0.08, -0.31],
      "k": 3,
      "num_candidates": 100
    },
    "explain": true
  }'

This command reproduces a real-world hybrid recall scenario and helps analyze the text score and vector score for each document.

The root causes of common failures usually fall into syntax, dimensions, and licensing

The first category is putting knn inside bool, which causes a syntax error immediately. The second is a mismatch between query_vector and the mapping dims, which commonly happens after a model upgrade when the index was not rebuilt. The third is running into license restrictions when using RRF.

If your business requires vector results only for documents that match the text query

The most direct solution is to use knn.filter as a pre-filter, so kNN searches only within documents that satisfy the text condition. This completely removes results that are strong vector matches but textually irrelevant.

{
  "knn": {
    "field": "frame_vector",
    "query_vector": [0.1, -0.2, 0.3],
    "k": 10,
    "filter": {
      "multi_match": {
        "query": "水煮鱼",
        "fields": ["anchor_name^2", "room_title"]
      }
    }
  }
}

This query uses the text condition as a filter to constrain the vector recall set at the source.

Application-layer RRF is a practical fallback when licensing is restricted

If you cannot use the commercial RRF feature, you can run full-text search and kNN search separately, then merge the rankings in the application layer. This approach is simple to implement and works especially well for smaller services that want to validate ranking gains first.

def rrf_score(rank1, rank2, k=60):
    return 1 / (rank1 + k) + 1 / (rank2 + k)  # Core logic: fuse the two result lists by reciprocal rank

text_hits = es.search(index="live_room", body={...})  # Run full-text search
knn_hits = es.search(index="live_room", body={...})   # Run vector search
merged = {}

for r, hit in enumerate(text_hits["hits"]["hits"]):
    merged[hit["_id"]] = rrf_score(r + 1, 1000)  # Apply a large rank penalty when the document is missing from the other channel

for r, hit in enumerate(knn_hits["hits"]["hits"]):
    if hit["_id"] in merged:
        merged[hit["_id"]] += rrf_score(1000, r + 1)  # Add fused score when the document appears in both channels
    else:
        merged[hit["_id"]] = rrf_score(1000, r + 1)

sorted_ids = sorted(merged, key=merged.get, reverse=True)  # Sort output by fused score

This code shows how to implement RRF fusion manually at the application layer for text retrieval and vector retrieval.

Production best practices should center on observable tuning

First, verify that vector dimensions, similarity settings, and model output are fully aligned. Then build a high-quality index with a relatively high ef_construction. During query execution, start with a smaller num_candidates in benchmarking and record recall, average latency, and P99 at the same time.

If the result set contains many documents that are semantically related but operationally irrelevant, first check whether your hybrid query mode should switch to knn.filter, rather than blindly increasing num_candidates.

A parameter quick reference works well as a pre-launch checklist

Goal Parameter Recommended Value
Vector dimension consistency dims Must exactly match model output
Graph quality ef_construction 200-500
Graph connectivity density m 16-32
Query recall depth num_candidates 500-10000
Result count k Business-defined, such as 10

FAQ

Q1: Why does Elasticsearch throw an error when I put knn inside bool?

A: Because Elasticsearch 8.x requires knn to be a top-level parameter that sits alongside query. It cannot be nested inside bool. If you need filtering, use knn.filter first.

Q2: Why can I not just update the mapping when query_vector dimensions do not match?

A: Because dense_vector.dims is immutable after creation. If the model dimension changes, you must create a new index and reindex your data. Otherwise, both queries and writes will fail.

Q3: Why can documents that do not match the text query still rank near the top in hybrid search?

A: Because in the parallel model, the final score is usually query_score + knn_score. Not matching the text query only means the text score is 0. It does not mean the document is filtered out. If you need to enforce text matching, use knn.filter.

AI Readability Summary

This article systematically explains the core mechanics of Elasticsearch vector search, covering dense_vector mappings, HNSW parameters, tuning for k and num_candidates, query + knn hybrid retrieval, score interpretation, and common error handling. It is well suited for teams building semantic search and recommendation systems.