Technical Specification Snapshot
| Parameter | Description |
|---|---|
| Core System | Elasticsearch / Lucene |
| Primary Language | Java |
| Access Protocol | HTTP REST |
| Storage Model | Inverted Index + Segment |
| Delete Mechanism | .del marker deletion |
| Update Mechanism | Delete old document + add new document |
| Merge Policy | TieredMergePolicy |
| Core Dependencies | Lucene, BitSet, DocValues |
| Reference Popularity | Originally published as a technical analysis article on CSDN |
Lucene’s immutable segments define the true semantics of updates and deletes
Many developers approach Elasticsearch for the first time with a relational database mindset: locate a row, then update or delete it in place. Elasticsearch does not work that way internally, because Lucene stores its index as multiple segments, and once a segment is written, it becomes immutable.
This design may seem to reduce write flexibility, but in practice it delivers stronger concurrent read performance, more stable page cache hit rates, and better compression efficiency. The trade-off is clear: a document cannot be rewritten in its original location. Elasticsearch must apply changes through a combination of “mark the old version as invalid” and “write the new content.”
Segment structure is best understood as read-only inverted index shards
Index
├── Segment A # Read-only segment that cannot be modified after being flushed to disk
│ ├── Inverted Index # Inverted index
│ ├── DocValues # Columnar storage
│ └── Stored Fields # Fields such as _source
├── Segment B
└── Segment C
This structure shows that Lucene’s basic management unit is not a single document, but a collection of read-only segments.
Delete operations are logically deleted rather than physically removed
When you execute DELETE /index/_doc/123, Elasticsearch does not immediately erase the document content, nor does it rewrite the entire segment. Instead, it first locates the segment and internal DocID for that document, then writes that DocID into a deletion marker set.
The most common carrier is the .del file. During query execution, the inverted index may still return that DocID, but the executor performs an additional check against the deletion bitmap and filters out deleted documents. As a result, the delete becomes visible to search immediately, while disk space reclamation is deferred.
.del files typically maintain delete state as a bitmap
public class DelFile {
// Use a bitmap to record whether a DocID has been deleted
private BitSet deletedBitSet;
public boolean isDeleted(int docId) {
return deletedBitSet.get(docId); // Check whether a document has been marked as deleted
}
public void markDeleted(int docId) {
deletedBitSet.set(docId); // Mark the target DocID as deleted
}
}
This code demonstrates the core idea behind .del: do not modify the original document; maintain only the deletion state.
Deleted documents remain until segment merge occurs
A delete does not immediately reduce disk usage. The old document content, inverted lists, and stored fields all remain in the original segment, but they are no longer visible externally. Only when a background segment merge runs will the new segment skip those deleted documents. The old segment and its .del file are then reclaimed together.
Update operations are effectively delete old version plus add new version
The _update API is often misunderstood as a true partial field modification. In reality, it provides partial update semantics at the API level, but internally Elasticsearch still needs to read the original _source, merge the fields, mark the old version as deleted, and reindex the fully merged document.
That is why updates are always more expensive than pure inserts. At minimum, an update involves one read, one delete marker write, and one rewrite. It also introduces downstream segment merge costs.
The update flow can be summarized in five steps
POST /my_index/_update/123
{
"doc": {
"title": "new title"
}
}
At the storage layer, this request does not only modify the title field. It triggers a full rebuild followed by reindexing.
The internal Elasticsearch process can be abstracted as the following pseudo-flow
def update_document(old_doc, patch, del_file, writer):
new_doc = {**old_doc, **patch} # Merge the original document with the updated fields first
del_file.mark_deleted(old_doc["_doc_id"]) # Mark the old version as deleted
writer.index(new_doc) # Write the new version into a new segment as a brand-new document
return new_doc
This pseudo-code makes the mechanism explicit: an update is never an in-place rewrite. It means “invalidate the old version, then write the new version into a segment.”
Version control resolves concurrent update conflicts
Each time a document changes, Elasticsearch advances its version or sequence metadata. The purpose is not auditing. It is to support optimistic locking and prevent two clients from overwriting each other’s updates when they operate on the same stale snapshot.
In high-concurrency write scenarios, you should explicitly use if_seq_no and if_primary_term. That way, the update succeeds only if the current document version still matches the expected one.
POST /my_index/_update/123?if_seq_no=1&if_primary_term=1
{
"doc": {
"title": "new title"
}
}
This request means: execute the update only if the version has not changed, so concurrent writes do not overwrite each other.
Segment merge is the moment when space is actually reclaimed and old versions are removed
As refreshes continue, the index accumulates many small segments. As updates and deletes increase, .del markers also accumulate. At that point, queries must scan more segments and repeatedly filter invalid documents, so performance gradually declines.
The job of segment merge is to rewrite multiple small segments into a larger new segment while skipping all deleted documents. The new segment carries no historical baggage. After the switch completes, the old segments and their .del files can be removed.
Manual merge is suitable only for low-write or read-only indexes
POST /my_index/_forcemerge?max_num_segments=1
This command forces segment merging, but it consumes significant CPU and I/O resources, so it is not appropriate for actively written indexes.
Automatic merge behavior is usually controlled by TieredMergePolicy
The system decides when to merge by evaluating factors such as segment count, segment size, and deletion ratio. The more deleted documents a segment contains, the more likely it is to be selected for merging. Even so, this remains a background trade-off strategy rather than a real-time cleanup mechanism.
High-frequency updates and deletes directly amplify write cost
From a resource perspective, an insert only needs to write a new segment. A delete must maintain delete markers. An update combines both delete and write costs. That makes frequent updates one of the most expensive write operations in many systems.
If your workload includes a high rate of document changes, common side effects include disk growth, higher query filtering overhead, busy merge threads, and increased I/O jitter. For workloads such as logs or order state streams, you should usually combine Index Lifecycle Management with hot-warm-cold tiering.
Common optimization strategies should focus on reducing unnecessary rewrites
- Batch updates are better than one-by-one updates
- When the delete ratio is too high, rebuilding the index is often better than keeping
.delfiles around indefinitely - For read-only historical indexes, you can run
forcemergeduring off-peak hours - For time-series data, prefer ILM or Data Streams
- For high-write indexes, deploy on high-performance SSD nodes
The key conclusions for interviews and troubleshooting are straightforward
First, disk space does not drop immediately after deletion because a delete is only a logical marker. Physical cleanup depends on segment merge. Second, an update is not a partial in-place modification. It deletes the old version and writes a new one. Third, the real cost of frequent changes often comes not from the API call itself, but from the merge rewrite work that follows.

AI Visual Insight: This diagram visualizes the core document change path in Elasticsearch: the old document remains in its original segment but is masked by the deletion bitmap; the new version is written into a new segment; and a background merge later rebuilds the inverted index and reclaims old segment space. Visuals like this make it easier to understand the two-phase behavior in which query visibility changes first, while disk usage changes later.
FAQ
Why doesn’t disk space shrink immediately after deleting a document?
Because deletion only writes a .del marker, while the old document data remains in the original segment. Disk space is reclaimed only after segment merge creates a new segment and discards the old one.
Why does _update look like a partial update but still cost so much?
Because Elasticsearch must first read the original _source, then merge fields, mark the old version as deleted, and reindex the complete document. It is not a field-level in-place update. It is a full document rewrite.
When should you consider running forcemerge manually?
It is suitable for archived or read-only indexes where writes have stopped and you want to reclaim space and improve read performance. It is not suitable for actively written indexes because it creates noticeable I/O and CPU pressure.
Core summary
This article systematically breaks down the internal mechanics of Elasticsearch document updates and deletes. It explains why Lucene’s immutable segment model defines “delete = marker” and “update = delete + insert,” and it clarifies the role of .del files, version control, segment merge, and the resulting performance implications. It is especially useful for developers troubleshooting write amplification and delayed disk space reclamation.