Elasticsearch highlighting marks matched text in search results to solve a common UX problem: users get a hit, but cannot quickly see what matters. This guide focuses on basic configuration, how to choose among three highlighter types, and performance optimization for log search, article search, and knowledge base scenarios.
Keywords: Elasticsearch, highlighting, FVH.
Technical specification snapshot
| Parameter | Description |
|---|---|
| Language | JSON DSL, with direct integration in Java-based scenarios |
| Protocol | REST API |
| Stars | Not provided in the source data; Elasticsearch community projects are typically highly active |
| Core dependencies | Elasticsearch, Lucene, Kibana |
| Default highlighter | unified |
| High-performance highlighter | fvh |
Elasticsearch highlighting improves result readability
Highlighting is not the query capability itself. It acts as an interpretation layer for query results. By wrapping matched text with pre_tags and post_tags, it helps users quickly spot key information in long text, logs, or search result pages.
The most common pain points fall into three categories: results match but are hard to scan, long content makes navigation slow, and multi-field matches create fragmented presentation. Elasticsearch provides the highlight module to address exactly these issues.
Basic highlighting configuration covers most business scenarios
GET kibana_sample_data_logs/_search
{
"track_total_hits": true,
"query": {
"match": {
"message": {
"query": "type:astronauts elasticsearch Mozilla",
"analyzer": "standard" // Use the standard analyzer for the query
}
}
},
"highlight": {
"highlight_query": {
"match": {
"message": "Mozilla" // Only highlight Mozilla
}
},
"fields": {
"message": {
"pre_tags": ["<mark>"], // Prefix tag for matched terms
"post_tags": ["</mark>"] // Suffix tag for matched terms
}
}
}
}
This configuration enables a refined presentation pattern: query multiple terms, but highlight only one of them.
Understanding key parameters makes highlighting predictable
highlight is the top-level configuration node that enables highlighting. fields declares which fields participate in highlighting, while highlight_query lets you decouple highlighting logic from the main query logic.
This decoupling is extremely practical. For example, when a user submits a complex expression, you can highlight only brand names, error codes, or core terms, avoiding excessive visual noise on the page.
The responsibilities of common fields can be reduced to a minimal model
{
"highlight": {
"fields": {
"message": {}
},
"pre_tags": ["<span style='color:red'>"], // Define a global prefix tag
"post_tags": ["</span>"] // Define a global suffix tag
}
}
It answers two core questions: what to highlight and how to highlight it.
Choosing among the three highlighter types directly affects performance and accuracy
Elasticsearch commonly provides three highlighters: unified, plain, and fvh. The default recommendation is unified, while fvh is usually the better choice for large production text bodies.
unified is the general-purpose default because it balances compatibility and accuracy. plain is a legacy mode that suits small-text experiments, but not large bodies of content. fvh depends on term vectors and fits high-performance body highlighting and merged multi-field presentation.
FVH depends on correct mapping preconfiguration
PUT my_fvh_demo
{
"mappings": {
"properties": {
"content": {
"type": "text",
"term_vector": "with_positions_offsets", // Store term positions and offsets in advance
"store": true // Store the field content separately
}
}
}
}
This mapping provides the index-time foundation that FVH requires. Without it, the highlighter cannot deliver its performance advantages.
The FVH highlighter is ideal for long text and knowledge base search
When field content is long, the cost of analyzing _source at query time rises significantly. FVH stores positional information during indexing, which reduces repeated computation during queries. That makes it a better fit for articles, documents, and log bodies.
If your system searches both a primary field and its analyzed subfields, FVH is also the only option that supports merged highlighting through matched_fields.
The production-ready FVH query pattern is straightforward
GET /my_fvh_demo/_search
{
"query": {
"match": {
"content": "ES 高亮 FVH 性能优化" // Search topic terms in the body content
}
},
"highlight": {
"type": "fvh",
"pre_tags": ["<mark>"],
"post_tags": ["</mark>"],
"fragment_size": 200, // Return up to 200 characters per fragment
"number_of_fragments": 2, // Return at most 2 highlighted fragments
"no_match_size": 100, // Return a 100-character summary when there is no match
"fields": {
"content": {}
}
}
}
It delivers stable highlighting and summary extraction for large text fields.
Merged multi-field highlighting creates a consistent search experience
Many systems search content and content.english, or title and title.pinyin, at the same time. If each field returns its own highlight output, the frontend often struggles to present the results coherently.
FVH supports matched_fields, which merges matches from multiple fields into a single primary field output. This preserves recall without sacrificing presentation consistency.
Merged multi-field highlighting is an FVH-only capability
GET /my_fvh_demo/_search
{
"query": {
"multi_match": {
"query": "Fast Vector 高亮器",
"fields": ["content", "content.english"] // Search both the primary field and the subfield
}
},
"highlight": {
"type": "fvh",
"fields": {
"content": {
"matched_fields": ["content", "content.english"] // Merge matches into the primary field
}
}
}
}
Its value is simple: broaden the search scope while keeping the final presentation focused on one primary field.
Full-field highlighting fits log and troubleshooting platforms
By default, require_field_match is true, which means the query field must match the highlight field. If you set it to false, other fields can also be highlighted as long as they contain matched terms.
This is especially valuable on log platforms. You might query message, but also want contextual fields such as url and request to be highlighted for faster troubleshooting.
Disabling field match restrictions enables cross-field highlighting
GET kibana_sample_data_logs/_search
{
"query": {
"match": {
"message": "Mozilla Firefox elastic" // The main query still searches only message
}
},
"highlight": {
"require_field_match": false, // Allow other fields to share the highlight condition
"fields": {
"message": {},
"url": {},
"request": {}
}
}
}
It makes “search one field, highlight multiple fields” possible.
AI Visual Insight: This image shows the result shape after enabling cross-field highlighting. Matched terms appear not only in the primary query field, but also in related display fields. It clearly illustrates how require_field_match=false improves result readability.
AI Visual Insight: This image compares the output before and after limiting the number of highlight fragments. It shows how number_of_fragments makes result pages more compact by controlling the number of returned snippets, which is especially useful for search result lists and suggestion panels.
Fragment count and ordering settings determine frontend presentation quality
number_of_fragments controls how many highlighted fragments to return, fragment_size controls the length of each fragment, and order determines whether fragments are returned by relevance or by original text order.
In practice, search result pages should usually limit fragment count first, while detail pages can relax that constraint. If you want the most relevant content first, use score. If you want to preserve reading context, use none.
Fragment control parameters are usually combined
{
"highlight": {
"order": "score", // Sort highlighted fragments by relevance
"number_of_fragments": 1, // Return only the most important fragment
"fragment_size": 150, // Limit each fragment to 150 characters
"fields": {
"message": {
"number_of_fragments": 2 // Override the global setting for a specific field
}
}
}
}
This type of configuration helps balance information completeness and page compactness.
Highlighting strategy in production should prioritize accuracy before performance
If you are just starting to integrate highlighting, begin with unified to validate your business flow. When you move into knowledge bases, long documents, or centralized search platforms, upgrade to fvh through proper mapping configuration.
In short: use unified for short text, use fvh for very long bodies of content, and avoid plain in production body-text scenarios. At the same time, limit fragment count and fragment length first to prevent response payload bloat.
FAQ
1. Which highlighter does Elasticsearch use by default?
The default is unified. It offers the best compatibility, usually requires no additional mapping, and suits most general-purpose search scenarios.
2. Why does the FVH highlighter perform better?
Because it relies on term_vector: with_positions_offsets to store term positions and offsets in advance, it avoids expensive real-time analysis of large text during queries.
3. When should I set require_field_match=false?
Use it when you want to separate the query field from the highlighted display fields. For example, query message while also highlighting related fields such as url and request.
AI Readability Summary: This article provides a structured walkthrough of Elasticsearch highlighting, covering highlight, highlight_query, require_field_match, fragment_size, number_of_fragments, and the trade-offs among unified, plain, and fvh. It helps developers implement readable, high-performance highlighting for log search, document retrieval, and multi-field matching scenarios.