PostgreSQL CDC can integrate directly into an application through the Debezium Embedded Engine, capture WAL changes, and emit structured events. This approach solves the high latency, invasive design, and heavy maintenance burden of traditional polling. Keywords: PostgreSQL CDC, Debezium, WAL logical replication.
The technical specification snapshot outlines the core setup.
| Parameter | Description |
|---|---|
| Core language | Java |
| Data source | PostgreSQL |
| Change protocol | WAL logical replication |
| Integration model | Debezium Embedded Engine |
| Core dependencies | debezium-embedded, debezium-connector-postgres |
| Runtime model | In-process CDC consumer inside the application |
| GitHub stars | Not provided in the source content; refer to the official Debezium repository |
This approach fits direct embedding inside business systems.
Compared with a Kafka Connect deployment, the Embedded Engine is better suited for small to mid-sized projects, proof-of-concept validation, and customized synchronization tasks. It brings CDC capabilities directly into the Java process and reduces the number of standalone components.
Its core value is straightforward: as soon as the database performs an insert, update, or delete, the application can receive change events in near real time. This makes it possible to unify auditing, search indexing, cache refresh, and heterogeneous data synchronization through a single pipeline.
The change data capture pipeline is easy to understand.
- PostgreSQL writes data changes into WAL.
- Debezium subscribes to WAL through a logical replication slot.
- The Embedded Engine parses the changes and invokes the application callback.
- The application writes events to logs, message queues, or downstream storage.
Properties props = new Properties();
props.setProperty("name", "pg-cdc-engine");
props.setProperty("connector.class", "io.debezium.connector.postgresql.PostgresConnector");
props.setProperty("database.hostname", "127.0.0.1");
props.setProperty("database.port", "5432");
props.setProperty("database.user", "postgres");
props.setProperty("database.password", "postgres");
props.setProperty("database.dbname", "demo");
props.setProperty("topic.prefix", "pgdemo");
props.setProperty("plugin.name", "pgoutput"); // Use PostgreSQL's native logical replication plugin
props.setProperty("slot.name", "debezium_demo_slot"); // Use a fixed replication slot to avoid repeated creation
props.setProperty("offset.storage", "org.apache.kafka.connect.storage.FileOffsetBackingStore");
props.setProperty("offset.storage.file.filename", "/tmp/pg-offset.dat"); // Persist offsets
props.setProperty("offset.flush.interval.ms", "1000");
props.setProperty("table.include.list", "public.users"); // Listen only to the target table
This configuration defines the minimum runnable CDC setup, including the PostgreSQL connection, replication slot, offset storage, and table scope.
PostgreSQL must enable logical replication first.
To make Debezium work, the database must satisfy the WAL logical replication requirements. The three most important parameters are wal_level, max_replication_slots, and max_wal_senders.
At the same time, the application database user must have replication privileges. If permissions are insufficient, the connector usually fails during startup rather than during event consumption.
ALTER SYSTEM SET wal_level = logical;
ALTER SYSTEM SET max_replication_slots = 10;
ALTER SYSTEM SET max_wal_senders = 10;
-- Grant replication capability to the CDC user
ALTER ROLE postgres REPLICATION;
-- Create a test table
CREATE TABLE public.users (
id BIGSERIAL PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(200),
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
This SQL enables the prerequisites for logical replication and prepares a sample table that you can use to verify CDC behavior.
The image indicates the source came from a column page rather than the full article body.
AI Visual Insight: This animated image looks more like a placeholder demo image from a blog post than a system architecture diagram. It does not directly show PostgreSQL, replication slots, WAL, or Debezium event flow details, which suggests that the main body of the original page was hidden behind a subscription wall. The currently visible information mostly consists of the article title, column attribution, and site interface elements.
Java applications can consume Debezium events directly.
The Embedded Engine is essentially a hostable thread. After the application starts, it initializes the engine, registers a callback function, and dispatches each batch of records into its own processing logic.
This allows developers to complete collection, transformation, and delivery inside a single process. It is especially suitable for writing into Elasticsearch, Redis, or a custom event bus.
DebeziumEngine<ChangeEvent<String, String>> engine = DebeziumEngine.create(Json.class)
.using(props)
.notifying(records -> {
for (ChangeEvent<String, String> record : records) {
String key = record.key();
String value = record.value();
System.out.println("CDC Key: " + key);
System.out.println("CDC Value: " + value); // Output the changed JSON event
}
})
.build();
ExecutorService executor = Executors.newSingleThreadExecutor();
executor.execute(engine); // Run the CDC engine in a dedicated thread
This example shows the most essential startup model for the Embedded Engine: load configuration, subscribe to events, and run asynchronously.
A change event usually contains before, after, and the operation type.
The JSON emitted by Debezium typically contains fields such as payload.before, payload.after, payload.op, and payload.ts_ms.
Here, c means insert, u means update, d means delete, and r commonly appears in snapshot reads.
If you only care about the latest state, you can read after directly. If you need auditing, diff comparison, or replay, you must retain both before and op.
{
"payload": {
"before": {"id": 1, "name": "Tom"},
"after": {"id": 1, "name": "Tommy"},
"op": "u",
"ts_ms": 1777479534000
}
}
This example shows that a Debezium event is not just row data. It is a change description with context.
Production deployments must focus on offsets, replication slots, and idempotent consumption.
The offset file determines where reading resumes after a restart. The replication slot determines whether WAL continues to be retained. If the application exits unexpectedly and does not consume for a long period, PostgreSQL WAL may accumulate.
For that reason, production deployments should monitor replication lag, slot backlog, connection status, and retry counts. The consumer side must also guarantee idempotency to avoid duplicate writes during replay.
These practical recommendations help avoid common issues.
- Use a separate
slot.namefor each environment. - Restrict
table.include.listprecisely to avoid scanning the whole database. - Store the offset file on persistent disk.
- Add downstream deduplication based on business primary keys.
- Verify PostgreSQL and Debezium version compatibility before upgrades.
FAQ provides structured answers to common questions.
1. Why choose Debezium Embedded Engine instead of Kafka Connect?
The Embedded Engine is lighter to deploy and fits scenarios where you want to integrate CDC directly into a Java service. If you need centralized governance, multi-source management, and large-scale distribution, Kafka Connect is the better choice.
2. What are the most common pitfalls in PostgreSQL CDC?
The most common issues are logical replication not being enabled, insufficient replication privileges, long-term replication slot backlog causing WAL growth, and offsets not being persisted, which leads to duplicate consumption after restart.
3. What are the most common downstream use cases after Debezium captures events?
Common scenarios include search index synchronization, cache invalidation notifications, audit logs, data subscription delivery, and real-time incremental synchronization from OLTP to OLAP.
The core summary reconstructs the implementation path for PostgreSQL CDC.
This article reconstructs the implementation path for PostgreSQL CDC and focuses on the minimum viable approach with the Debezium Embedded Engine. It covers architecture principles, environment configuration, Java integration, event consumption, and common issues, helping developers build a real-time data change pipeline at low cost.