Smooth Business Data Migration in Practice: Dual Writes, Backfill, Reconciliation, and Gradual Traffic Cutover

This methodology helps backend architects migrate business data smoothly without interrupting production traffic. Its core goal is to complete table schema, storage engine, or data model upgrades without downtime. The approach controls risk through dual writes, backfill, reconciliation, and gradual traffic cutover. Keywords: data migration, dual-write consistency, canary release.

Technical Specification Snapshot

Parameter Description
Domain Backend Architecture / Data Migration / Canary Release
Primary Languages Java, SQL
Related Protocols JDBC, HTTP (commonly used in monitoring and management planes)
GitHub Stars Not applicable; this is a technical blog article
Core Dependencies MySQL/relational databases, NoSQL, job scheduling systems, monitoring and alerting systems
Target Problems Consistency, observability, and rollback capability during old-to-new storage cutover

Business Data Migration Must Progress in Stages

Business data migration usually appears in two scenarios. First, the business model changes, which alters entity relationships or field semantics. Second, the storage structure is upgraded, such as moving from one table to a new table, or from a relational database to HBase, Elasticsearch, or another storage system.

The real challenge in this kind of refactor is not writing code. It is switching systems without interrupting the business. Without canary controls and reconciliation mechanisms, you can easily end up with inconsistencies between old and new data, drift in query results, or offline jobs reading from the wrong source.

An Executable Migration Workflow

Old write path -> Incremental dual writes -> Historical backfill -> Full reconciliation -> Incremental dual-read comparison -> Gradually switch reads to the new source -> Stop legacy writes

The core value of this workflow is that it breaks a high-risk migration into multiple verifiable stages. Every step is observable, pausable, and reversible.

Incremental Dual Writes Act as the Safety Valve at Migration Startup

The first step is not to switch traffic immediately. Instead, make sure all newly written data lands in both the old storage and the new storage. If both writes happen in the same database and transaction, you can place writes to the old and new tables in a single transaction and guarantee consistency directly.

If dual writes span databases or storage systems, such as migrating from MySQL to HBase, you usually cannot rely on a single transaction. In that case, the common strategy is to write to the old store first and then write to the new store, with retries, compensation, and alerting for failed writes.

image AI Visual Insight: The image shows the dual-write path at the beginning of migration. A business write request first enters the legacy store and then writes to the new store synchronously or asynchronously. This structure emphasizes that the legacy path remains the primary production path, while the new path is introduced gradually. It works well for validating write success rate, latency, and compensation effectiveness under real traffic.

public void save(Order order) {
    oldRepo.save(order); // Write to the legacy database first to keep the primary production path stable
    try {
        newRepo.save(order); // Then write to the new database and gradually build the new data plane
    } catch (Exception e) {
        retryQueue.send(order.getId()); // Send failed writes to the new database into the compensation queue
    }
}

This code demonstrates the basic idea behind cross-storage dual writes: prioritize the legacy path, fill the new path, and reduce the inconsistency window with a compensation mechanism.

Historical Data Backfill Fills Gaps Left by Legacy Records

Dual writes only cover newly generated data. They do not automatically handle historical records. That is why the second step must run a backfill process that exports, reads, and writes historical records from the old table to the new table based on the business primary key.

Run backfill jobs by shard whenever possible, and record progress, watermarks, and failure details. This approach controls job duration, makes reruns easier, and avoids the production impact of a one-time full-table scan.

image AI Visual Insight: The image shows the batch path for offline backfill. It first exports business IDs, then reads by ID from the old table and writes into the new table. This pattern highlights the division of responsibilities: incremental dual writes protect future data, while offline backfill repairs historical data. It is especially suitable for large-table migrations executed in parallel batches.

Backfill Jobs Should Be Idempotent

INSERT INTO new_table (biz_id, field_1, field_2)
SELECT biz_id, field_1, field_2
FROM old_table
WHERE biz_id >= ? AND biz_id < ?;

This SQL migrates historical data by range. In production, combine it with unique keys, UPSERT semantics, or overwrite writes to ensure that repeated execution does not create dirty data.

Full Reconciliation Is the Hard Gate Before Read Cutover

Even after dual writes are fully enabled and backfill has completed, you still should not switch reads immediately. Finishing a task does not mean the data is fully consistent. You must perform full reconciliation to confirm that the old and new datasets match in both record count and content.

You can reuse the scanning capability of the backfill program and switch it into compare-only mode, or build an independent offline job that validates record counts, primary key coverage, and field-level differences.

-- 1. Compare whether the total counts are consistent
SELECT COUNT(1) FROM table_a;
SELECT COUNT(1) FROM table_b;

-- 2. Find records that exist only in the left table
SELECT *
FROM table_a a
LEFT JOIN table_b b ON a.biz_field = b.biz_field
WHERE b.biz_field IS NULL;

-- 3. Find records whose field values are inconsistent
SELECT *
FROM table_a a
INNER JOIN table_b b ON a.biz_field = b.biz_field
WHERE a.field_1 <> b.field_1
   OR a.field_2 <> b.field_2
   OR a.field_3 <> b.field_3;

These SQL statements validate three minimum acceptance goals for migration: matching counts, matching primary keys, and matching content.

Incremental Dual-Read Comparison Validates the Real Query Path

Full reconciliation mainly covers static results, but production issues often surface in the query path. For example, a different index design in the new table, an incorrect field mapping, or a change in serialization rules can all produce inconsistent read results.

That is why the fourth step should implement dual-read comparison in the shared query layer. The same request reads from both the old and new tables, compares the results, and either returns the old result or gradually returns the new result to a limited audience. This lets you validate the new read path without affecting users.

image AI Visual Insight: The image shows a dual-read validation architecture. The same query request accesses both old and new data sources in parallel, and the application compares the results. The point of this design is not performance. It is to detect hidden differences in field mapping, sort order, default values, and aggregation logic.

public Order query(Long bizId) {
    Order oldData = oldRepo.query(bizId); // Use the legacy path as the baseline result
    Order newData = newRepo.query(bizId); // Use the new path for real-time validation
    if (!Objects.equals(oldData, newData)) {
        monitor.reportDiff(bizId, oldData, newData); // Report differences to monitoring
    }
    return oldData; // During the canary phase, return the legacy result first to avoid business impact
}

This code illustrates the core principle of dual-read comparison: validate first, then cut over, rather than replace first and fix later.

Switching Reads and Stopping Legacy Writes Requires Dependency Cleanup First

Only after dual-read comparison remains stable over time should you gradually shift query traffic to the new table. At the same time, verify that all offline jobs, reports, search indexes, and downstream consumers have already switched to the new data source.

If you switch online reads but offline jobs still depend on the old table, stopping legacy writes later will create missing metrics or data gaps. That is why dependency inventory is a mandatory step before cutover.

image AI Visual Insight: The image shows the state after read traffic has switched to the new table. The old table still retains write capability or fallback capability, while the main query path has already moved to the new store. This is the typical canary read-cutover stage, used to observe production hit rates, query latency, and error distribution.

image AI Visual Insight: The image shows the final stage of migration. The legacy write path is shut down, and the new table becomes the only data source. This phase usually means that all online and offline dependencies have completed the transition, and the system has entered a stable operating period under the new architecture.

Migration Completion Checklist

1. Have all online queries switched to the new table?
2. Have all offline jobs switched to the new data source?
3. Has the reconciliation job remained stable with no differences over time?
4. Have alerting, retries, and compensation queues been fully drained?
5. Has the rollback plan been rehearsed?

This checklist ensures that stopping legacy writes represents business completion, not just technical completion.

The Essence of Smooth Migration Is Turning the Uncontrollable Into the Verifiable

Business data migration has no silver bullet, but it does follow a reliable pattern: incremental dual writes protect future data, backfill repairs historical data, reconciliation and dual reads validate consistency, gradual read cutover limits blast radius, and finally stopping legacy writes completes the migration.

For architects, the real key is not any specific SQL statement or framework. It is whether you have established a complete validation chain, observability mechanism, and rollback path. Once those capabilities are in place, even complex migrations can remain under control.

FAQ

FAQ 1: Why not just perform a one-time full data export and switch everything at once?

Because the business keeps writing data continuously. A one-time full export only covers data up to a specific point in time. Without incremental dual writes and subsequent dual-read comparison, the cutover can easily introduce missing data and read/write inconsistencies.

FAQ 2: During cross-database or cross-NoSQL migration, what is the most important mitigation for dual-write failures?

The core answer is a compensation mechanism plus monitoring and alerting. A common approach is to make the legacy path succeed first, then write failed new-path operations into a retry queue while recording the business key, failure reason, and retry count so that eventual consistency remains traceable.

FAQ 3: When can you truly stop writes to the old table?

You must satisfy all three conditions at the same time: online queries have switched to the new source, offline dependencies have migrated, and reconciliation has shown no differences continuously. If only one or two conditions are met, stopping legacy writes is still risky because hidden dependencies may remain.

AI Readability Summary: This article turns business data migration into a practical SOP with six stages: incremental dual writes, historical backfill, full reconciliation, dual-read comparison, switching reads to the new source, and stopping legacy writes. It focuses on the highest-risk migration problems, including inconsistency between old and new data, difficult rollback, and overlooked dependencies.