Don't corrupt your migration
Use an Anticorruption layer.
Anticorruption is an architectural approach, part of domain-driven design, to translate data between semantically-incompatible systems.
For data analytics migration projects, an Anticorruption layer can really help. Typically, source data would be copied in its raw form to a 'raw' data lake container, or a 'staging' schema in a database, for a data analytics platform to ingest. However, migration projects are a bit different. The aim is to deprecate the legacy system.
There will be many reasons why the migration project is happening. It's likely that the legacy system is slow or on old hardware/software, or its data model has become hard to use. It's therefore likely that the replacement system will have a new data model with new keys, structures and semantics.
We probably don't want some of the legacy data points to end up in the new system. We just want meaningful business keys and related meaningful attributes.
With an agile approach, it's likely the legacy system and the new system will need to run in parallel until all requirements are met in the new system including all data migrated. At the end of the project, both the legacy system and the anticorruption layer can be deprecated.
How do we accomplish this with an Anticorruption layer?
The Anticorruption layer will need to ingest the legacy business keys, primary keys and attributes, and the new system business keys and primary keys. If attributes will be changed directly in the new system, its attributes will also need ingesting into the Anticorruption layer so they can be written back to the legacy system. The goal of the Anticorruption model is to be able to use business keys, common in both systems, to find the primary keys of both the legacy and new systems. This layer effectively synchronises the primary keys from both systems by using the business keys. The attributes are there as a step in the migration data load from one system to the other.
Here's how a migration data flow may look:
As this data flow continues to run, the legacy data in the Anticorruption model continues to get enhanced with the primary keys from the New model. This enables the New ETL/ELT/ETLT/ELTLTLTLTL to handle updates using existing primary keys which have looped back around through Anticorruption.
The New system is not corrupted by the primary keys, and inherently the model, from the legacy system. The legacy and Anticorruption systems can both be deprecated once the New project is fully commissioned.
There are downsides of the extra 'hop' in the data flow to merge the two systems in Anticorruption, and the temporary cost of running the Anticorruption layer, but it's likely a lot of the cleansing and remodelling has to be done anyway. This design approach has the benefit of a clean new system.