The beneficial impact of a central repository and process is enormous (read more here). However, in order to realize those benefits, there are important features that the repository must have. Over the years, we have honed and continue to optimize our data repository that our data migration software, Applaud®, uses. If Premier is not part of your implementation, the following features should be part of the repository that is used to facilitate the data migration. If Premier is performing the migration, Applaud makes use of all of these.

1. Quickly replicate the legacy and target table structures

The data repository should be able to automatically create the legacy and target data structures without a DBA or SQL scripts. Throughout a project, data sources are discovered and require immediate access. If it takes more than a minute to set up the meta-data and bring in multiple data sources, project timelines will be impacted.

2. Easily create additional/alter table structures that reside only in the repository

For the repository to serve as an easy to use sandbox, the creation of additional tables and columns that only exist in the repository needs to be simple. Specifications, data enhancement\cleansing spreadsheets, and cross reference information will be constantly changing throughout the course of the project. To react to the constant requests, data structures need to be created, dropped, and altered on the fly.

3. Gracefully handle bad data

Data migration projects are all about handling both bad and good data... but mostly bad. If the migration processes can’t easily handle bad or invalid data, it’s going to be difficult for it to be successful. The repository should gracefully handle character data in numeric fields, invalid dates, etc. without losing rows. If rows are lost upon insert into the repository, the integrity of the data is lost, muddying further data analysis.

4. Beyond simple to get data in and out

Moving data from environment to environment is the core procedure of a data migration. To maximize effectiveness, facilitating the movement of data should be accomplished with minimal effort. When possible, a direct database connection should be used. In some legacy environments, mainframe especially, that’s not always possible.

However, bringing in\exporting out flat, EBCDIC, or any other file format should be a simple process. In the case of mainframe, being able to natively handle EBCDIC, packed numerics, repeating segments, etc is an immeasurable risk reducer, that several of our engagements could not have been successful without those capabilities. Whatever the format, there could be 1000s of tables\files and if it takes more than a small amount of time to get at that data, it could quickly devolve into an unmanageable process. (Read how we successfully handled over 3000 data sources)

5. Fast to build components that analyze the data once it’s in the repository

It is incredibly important that the repository is a place that’s easy to query, combine, harmonize, and separate out data within a single system and across the data landscape. If the repository doesn’t easily facilitate this cross-system analysis, the effectiveness of the repository is diminished.

In addition to a centralized data repository, there are many other techniques and processes involved in the data migration process that further reduce risk on ERP, PLM, and other complex implementations. If you have any questions regarding data migration processes, data issues, and ways to reduce data migration risk, email me at steve_novak@premierintl.com or call me at 773.549.6945.