Data is always underestimated on new implementations. From the initial data design, to data quality, the migration, and through the final data validation, this underestimation of data is the one constant during my 5+ years of being a data migration consultant. This underestimation frequently increases the risk that the data will cause projects to be delayed. There are several choices a project team can make to minimize the risk behind data migration, some more obvious than others. One often overlooked risk mitigation technique that is always missing when we rescue projects is the utilization of a centralized data repository.

A data repository is a general term used to refer to a destination designated for data storage. Many IT experts use the term to refer to a specific setup within an overall IT structure, such as a group of databases, a data warehouse, a data lake, etc. Most of the existing literature focuses on how a data repository is key for data analytics and reporting. In addition to analytics, data repositories provide significant benefits within the data conversion space. Two important benefits are cross system translation and the ability to take an external snapshot.

Cross System Translation

One of the biggest benefits of having a centralized repository is that it allows all data sources to exist together in one place. These data sources not only include data from legacy systems, but any supplemental data, like cross-references and data stored on flat files. Once all the data is in the single location, conversion, analytics, and reporting can occur without worrying about the impact on the existing production systems.

One critical type of analysis that is frequently performed is cross system de-duplication. Duplicates within and across the data landscape have both technical and business impacts. Technically, duplicated data across key fields will not load in most cases. This results in missing/incorrect attributes and significant extra work, usually under high pressure, to get resolved. A larger concern is the impact of duplicated data that does not get cleaned and gets loaded. This duplicated data can directly and severely impact organizations bottom lines. One Client estimated that they would save 30% annually, 10s of millions of dollars, on their raw material purchases once their suppliers and items were de-duplicated.

Another added benefit of converting from a data repository is it facilitates an easier way to account for cross system migration rules. A Client was migrating several disparate legacy systems to Oracle Cloud ERP. They stored supplier data in a Rollcim database and purchase order data in an Infor database for a single business entity. The business rules for conversion was to select the suppliers with open purchase orders. With the supplier and purchase order data in different systems, the central location made it simple to execute the selection and identify integrity issues between Infor and Rollcim well before each test cycle and go-live.

External Snapshot

A second risk reducing benefit of the data repository is that it naturally facilitates the ability to take an external snapshot. This snapshot of data contains all legacy data sources, supplemental data sources, pre-conversion data from the target applications including configuration, and post conversion data from the target applications. These snapshots are completely frozen and stable. This stability provides a solid reference point for questions and a base for all downstream conversion activities.

One large benefit of having this solid reference point is that it can facilitate a large portion of the data validation process. It does this by making data at each stage of the migration available for review. By having data at all stages at different points in time, it is possible to directly compare the legacy and the post converted data without having to worry about data changes over time. The centralization of data accelerates the validation process by providing all relevant data as it was at the time of conversion.

In addition to having the data landscape captured through an individual conversion cycle, the repository also can preserve the data at multiple points in time throughout the entire implementation. This allows the team to compare the converted data from test cycle to test cycle, preventing the need to re-validate data that’s already been validated, ensuring that the expected record counts are accurate, and quickly assess all enhancements to the conversion business requirements.

In Conclusion

A central data repository is a critical part of Premier’s Best Practices and a key feature of our data migration software, Applaud®. We often rescue struggling implementations and one of several commonalities of these projects is that none of them utilized a centralized repository (read rescue story). One of the first things that we do when a Client is struggling, is implement our repository so we can immediately analyze the data landscape to understand the current issues. The power and effectiveness of having a central repository can have immeasurable impact on implementations. While only two critical benefits have been highlighted within this post, more information can be found in this post by my colleague Steve Novak and more information about our data migration software, Applaud, can be found here.