Rebuilding the warehouse without downtime — a field guide.

The hardest thing about a warehouse migration is not the warehouse. It is the 1,400 reports, 200 analysts, 17 integrations, three regulators and one CFO who all expect Monday's numbers to be identical to last Monday's — while you rebuild the thing that produced them.

We have led a dozen of these. Some have taken five months, some have taken eighteen. The ones that succeeded followed the same pattern. This article is the field guide.

The five failure modes

Before the pattern, the failures we are trying to avoid.

The big-bang cutover. Everything moves at 11pm on Saturday. Nothing recovers by 9am on Monday. You end up running two systems in parallel forever.
The shadow migration. Engineering rebuilds the warehouse in a new stack; analysts never move; six months later nothing is retired.
The rewrite-everything. "While we're at it" — new data model, new naming, new KPIs. The business cannot tie back to any previous report. Trust collapses.
The silent regression. A pipeline in the new stack produces a subtly different number. Nobody notices for four months. By the time they do, the number is in the accounts.
The capability gap. Nobody on the client side can operate the new stack. The migration is successful; the platform is abandoned.

The pattern below is designed to make each of these unlikely.

The pattern: strangler by domain

We migrate one domain at a time, running old and new in parallel, with reconciliation in production. We never cut over the whole warehouse. The old system keeps serving until each domain has been proven, in production, for sixty days.

Phase 0 — The audit (2–4 weeks)

Before we move anything, we measure what is actually used. The answer is almost always "less than the dashboards suggest". On a recent engagement, usage telemetry showed that 83% of dashboards had fewer than three users per month. We do not migrate those. We retire them, with owner sign-off.

Audit outputs:

Top-N report list, ranked by usage and business criticality.
Domain map: which reports depend on which schemas, which schemas depend on which sources.
A ranked migration order, by domain, driven by business criticality and technical coupling.

Phase 1 — Foundation (4–8 weeks)

Stand up the new stack before any domain migrates. Governance, identity, CI/CD, observability, cost controls, data-product templates. Do not build pipelines yet. Build the factory first.

On our last lakehouse build, the foundation phase was six weeks and shipped nothing that an analyst could see. This is correct. If the foundation is right, every domain that follows is faster. If it is wrong, every domain bakes in the mistake.

Phase 2 — Domain migration (ongoing, N×4–6 weeks)

Each domain follows the same protocol:

Parallel ingest. Wire the new stack to the same sources as the old. Do not interrupt the old.
Parallel model. Rebuild the domain's tables and metrics in the new stack. Preserve naming unless there is a strong reason to change.
Reconciliation in production. Run a daily job that compares key metrics in the two systems. Investigate every variance above a tolerance. Sign off row-level agreement on the top-N metrics.
Dual-publish reports. Analysts see both sets side by side, labelled clearly. They do not yet rely on the new one.
Sixty-day parallel run. Only after sixty days of green reconciliation does the old pipeline retire.
Retire. Turn off the old pipeline. Celebrate. Move to the next domain.

Phase 3 — Consolidation (4–8 weeks)

Once the last domain is on the new stack, spend a quarter retiring what is left: orphaned jobs, unused tables, forgotten reports. Every engagement we have finished has left behind 10–20% of legacy that nobody is willing to turn off without formal retirement governance. Plan for this explicitly.

The reconciliation discipline

Reconciliation is where migrations fail silently. The key principles:

Rule of thumb

Reconcile in production, not in test.

Test environments never have the real data volume, the real sequence of inserts, or the real late-arriving corrections. The only honest reconciliation is production vs. production.

Row-level for the top-N metrics. Not aggregate-level. Aggregates can agree to the penny while individual rows are wrong.
Daily, automated, visible. A reconciliation report that nobody reads is not a reconciliation.
Owned variance tolerance. The business owner — not engineering — sets the tolerance for each metric.
Documented exceptions. When you intentionally change a metric definition (it happens), document it and sign it off with Finance.

What goes in the first domain

Pick a domain that is:

Important enough that retiring the old pipeline matters
Small enough to fit in 4–6 weeks
Well-understood — pick a loved domain, not a disputed one

On a recent engagement we started with Accounts Receivable. Well-defined, owned by Finance, bounded in scope, and a visible early win. Four weeks in, a named Finance sponsor was defending the new stack in executive reviews. That sponsor unlocked the next six domains.

People, not pipelines

The technical work is the smaller half. The larger half is that the organisation changes its habits. Analysts learn a new stack. Product owners learn a new governance model. Data consumers learn new URLs. If this shift is not part of the plan, a technically successful migration produces a culturally abandoned platform.

On every engagement, we commit a named consultant to capability transfer from day one. By the end, your people run it.

Rebuilding the warehouse without downtime — a field guide.

The five failure modes

The pattern: strangler by domain

Phase 0 — The audit (2–4 weeks)

Phase 1 — Foundation (4–8 weeks)

Phase 2 — Domain migration (ongoing, N×4–6 weeks)

Phase 3 — Consolidation (4–8 weeks)

The reconciliation discipline

Reconcile in production, not in test.

What goes in the first domain

People, not pipelines

Start a conversation.