Technically, data migration involves three components: extract, transform and load. These are commonly referred to as ETL.
From a non-technical perspective, data migration should involve, among other things: understanding legacy system use cases so that key ones can be preserved; developing field and record exclusion criteria; and making recommendations for better data management within the new system. But more on the business side of data migration in a future post.
The data extraction component of ETL, which should be easy, can have some unexpected complexities.
An important question to ask of your implementation partner is whether they are a specialist in extracting data from your specific legacy system(s). If your partner is counting on you to provide data extracts, do you have the internal skill sets and/or time? What happens if you get well into your CRM implementation project and find out that nobody knows how to properly extract data from your legacy system(s)?
When Standard Data Repositories Are Non-Standard
Often, legacy data is in a seemingly “standard” store such as Microsoft SQL server or MySQL, so data extraction should be easy, right?
Not necessarily. Just because a data source is a standard repository, it doesn’t mean that the data itself is in a simple or standard format and that extraction is a straightforward exercise.
1. SQL Views
What’s exposed in the user interface of a legacy system can, under the hood, be a SQL view that’s comprised of multiple database tables. For someone who is not familiar with the legacy system’s structure, it can take a lot of effort to determine what’s behind the data that’s displayed to users.
2. Volume of Tables
Legacy CRM systems can sometimes have hundreds of tables with many joins. If your implementation partner doesn’t have experience extracting data from a specific legacy CRM system, they may not be able to identify all the tables from which data needs to be extracted.
3. Formatting Tags
What is readable in a to a user in a rich text field in the legacy system may have proprietary formatting tags behind the scenes. For example, while later versions of GoldMine run exclusively on Microsoft SQL Server, the memo field of the Mailbox (email) table contains proprietary tags that need to be removed in order for the email body to be readable in a new CRM system.
4. BLOB Fields
As one example, file attachments can be stored directly in legacy database within BLOB (binary large object) fields. In this case, attachments can only be extracted via the system’s API — a direct extraction would yield large amounts of meaningless binary data.
If you’re halfway through a CRM implementation project and discover that there’s no one who can properly extract legacy system data, can you call in a data extraction specialist to save the day? Yes, but that specialist may need to understand and get involved in the bigger picture.
Data extraction should be part of the overall implementation process and will not be optimal if performed in isolation and/or in a time crunch. It’s always better to ensure ahead of time that informed data extraction is part of the project plan.