Elasticsearch Document Update and Delete Guide: 7 APIs, Best Practices, and Production Pitfalls

This article focuses on the core Elasticsearch APIs for document updates and deletions, addressing common development pain points such as “Should I use PUT, _update, or _bulk?” and “How do I avoid accidental deletes or overwriting fields?” Keywords: Elasticsearch, document update, bulk delete.

Technical Item Specification
Language REST API / JSON
Protocol HTTP
Applicable Component Elasticsearch
Stars Not provided in the original
Core Dependencies Elasticsearch cluster, index mappings, Kibana or curl

Elasticsearch document updates and deletes are near real-time operations built on a rewrite mechanism

An Elasticsearch update is not an in-place modification like in a traditional database. At the storage layer, it is closer to “write a new version and mark the old document as invalid.” That is why both updates and deletes are near real-time operations, and you typically need to wait for a refresh before you can reliably query the latest result.

From an API perspective, updates fall into four categories: full replacement, partial update, update-if-exists-and-insert-if-not, and bulk update. Deletes fall into three categories: delete by ID, bulk delete, and delete by query. The key to choosing the right API is not whether it can get the job done, but whether it is safe and efficient.

You can choose the operation path based on business granularity first

  • If you already have the full document: prefer PUT /index/_doc/id
  • If you only need to modify some fields: prefer POST /index/_update/id
  • If you are not sure whether the document exists: use upsert
  • If you need to process multiple documents at once: use _bulk
  • If you need to delete a matched result set: use _delete_by_query with caution
# Create a test index and define field mappings
curl -X PUT "localhost:9200/user" -H "Content-Type: application/json" -d '{
  "mappings": {
    "properties": {
      "name":  { "type": "text" },      # Name supports full-text search
      "age":   { "type": "integer" },   # Age is stored as an integer
      "city":  { "type": "keyword" },   # City is used for exact matching
      "email": { "type": "keyword" }    # Email is typically queried by exact value
    }
  }
}'

This code initializes the user index and its basic mappings so you can reproduce all update and delete examples that follow.

Creating minimal test data is the prerequisite for verifying update semantics

Insert one standard document first, then verify full replacement updates and partial updates separately. This is the easiest way to understand the difference. A full replacement update requires a complete document. Otherwise, fields not included in the request may be overwritten and lost.

# Insert a test document as the baseline for subsequent update and delete operations
curl -X POST "localhost:9200/user/_doc/1" -H "Content-Type: application/json" -d '{
  "name": "张三",        # Initial name
  "age": 25,             # Initial age
  "city": "北京",       # Initial city
  "email": "[email protected]"  # Initial email
}'

This code writes a test sample. All subsequent API examples can use the document with ID 1.

Full replacement updates are suitable for replacing an entire object

PUT /user/_doc/1 replaces the old document with a new one. Its advantage is straightforward semantics, which makes it suitable for syncing external master data. Its drawback is that if the payload is incomplete, old fields will be overwritten by the new content.

{
  "name": "张三",
  "age": 26,
  "city": "上海",
  "email": "[email protected]"
}

This request body fully replaces the same document and works well when an upstream system provides a complete entity snapshot.

Partial field updates should be your default choice in day-to-day development

POST /user/_update/1 changes only the specified fields and leaves undeclared fields untouched. This is the most common and safest approach, especially for incremental changes such as order status, user attributes, and tag fields.

{
  "doc": {
    "age": 27,
    "city": "深圳"
  }
}

This request body updates only age and city. name and email remain unchanged.

Upsert and bulk are the two most important enhanced capabilities in enterprise scenarios

When you cannot be sure whether a document already exists, a normal update may fail. Upsert provides the atomic behavior of “update if it exists, insert if it does not.” It is ideal for synchronization pipelines, consumer queues, and idempotent writes.

{
  "doc": {
    "name": "李四",
    "age": 28
  },
  "upsert": {
    "name": "李四",
    "age": 28,
    "city": "杭州",
    "email": "[email protected]"
  }
}

This request body gives _update both update and insert behavior, making it a good fit for data synchronization tasks where document existence is uncertain.

Bulk updates and bulk deletes should both use _bulk first

The core value of _bulk is reducing network round trips and request overhead. It does not automatically guarantee that every operation succeeds, so in production you must inspect the errors field and each individual execution result in the response.

{ "update": { "_id": "1" } }
{ "doc": { "age": 29 } }
{ "update": { "_id": "2" } }
{ "doc": { "city": "广州" } }
{ "delete": { "_id": "3" } }

This bulk payload shows how to mix updates and deletes in a single request. It is the standard pattern for high-throughput write pipelines.

Delete APIs carry significantly higher risk than update APIs

Single-document delete is the safest option, using DELETE /user/_doc/1. It is suitable when the business logic needs to reclaim one specific object. The response usually includes "result": "deleted". If the document does not exist, Elasticsearch also returns a corresponding status, which you can use for idempotent handling.

_delete_by_query is very different. It scans and deletes every document that matches the query. If the condition is wrong, the blast radius is often index-wide. That is why you must run _search with the same DSL first to validate the matched set before using it in production.

{
  "query": {
    "term": {
      "city": "北京"
    }
  }
}

This query body deletes all documents where city=北京. Before you execute it, switch the endpoint to _search first and verify the number of matches.

You can use the comparison table below to choose the right operation directly

Operation Type HTTP Method Endpoint Advantage Recommended Scenario
Full update PUT /index/_doc/ID Clear semantics Full object replacement
Partial update POST /index/_update/ID Safe and field-granular Routine field changes
Upsert POST /index/_update/ID Atomic write Idempotent synchronization
Bulk update/delete POST /index/_bulk High throughput and low network overhead Batch jobs
Single delete DELETE /index/_doc/ID Precise and controllable Delete a specific document
Delete by query POST /index/_delete_by_query Flexible Data cleanup, historical archiving

Elasticsearch update and delete flow diagram AI Visual Insight: This diagram shows how client requests enter Elasticsearch and then split into two paths, update and delete. It further breaks them down into full replacement, partial update, single delete, and bulk or conditional delete. The emphasis is on API selection rather than the underlying storage structure, so it works well as an entry-point decision diagram.

You must avoid five high-frequency production pitfalls in advance

First, partial updates must wrap fields inside doc, or the request semantics are invalid. Second, updates and deletes are near real-time, so you should not immediately depend on query results right after a write for strict assertions. Third, _delete_by_query must always follow the pattern of search first, then delete. Fourth, full replacement updates can cause field loss if the payload is incomplete. Fifth, disk space is not reclaimed immediately after deletion and usually depends on later segment merges.

# Search first, then delete, to avoid accidental deletes caused by incorrect conditions
curl -X POST "localhost:9200/user/_search" -H "Content-Type: application/json" -d '{
  "query": {
    "term": {
      "city": "北京"   # Confirm the matched document set first
    }
  }
}'

This command demonstrates the safety validation flow before a conditional delete and should be part of your production operating standard.

The conclusion is that partial updates and bulk APIs should be your default strategy

If you remember only three rules, make them these: prefer _update for routine changes, standardize on _bulk for batch processing, and always run _search before _delete_by_query. These three practices already cover most enterprise Elasticsearch document maintenance scenarios.

When data comes from an external master system and naturally arrives as a complete snapshot, then consider a replacement update with PUT. When the pipeline requires idempotent writes, prioritize upsert. The APIs themselves are not complicated. What determines stability is whether you choose the right operation based on data granularity and risk level.

FAQ

1. Why does an Elasticsearch update look like a modification, but internally behave more like a rewrite?

Because Elasticsearch is built on inverted indexes and segment files, documents are usually not modified in place. Instead, Elasticsearch writes a new version and invalidates the old one. As a result, update cost is related to field size and refresh cadence.

2. Why is _update generally recommended over PUT?

_update changes only the specified fields and does not overwrite other fields when the request body is incomplete. That makes it a better fit for incremental business attribute updates, with better safety and maintainability in most cases.

3. What is the most important thing to watch out for when using _delete_by_query in production?

The most important rule is to run _search with the same condition first to validate the match scope, and execute delete jobs during off-peak hours when possible. Otherwise, an incorrect DSL condition can trigger large-scale accidental deletion with a very high recovery cost.

Core summary: This article systematically reconstructs Elasticsearch document update and delete operations, covering full replacement with PUT, partial updates with _update, upsert, batch updates with _bulk, single deletes with DELETE, and conditional deletes with _delete_by_query. It also explains near real-time behavior, execution mechanics, accidental delete risk, and production best practices.