Five Steps to Implementing MDM
by Mike Rosen, Director, Cutter Consortium Enterprise Architecture Practice
In my recent Advisor ("How to Make MDM Go: Start with Architecture,"), I discussed the role of enterprise information architecture in Master Data Management (MDM). In this Advisor, I look at the steps to implementing MDM once you have your information architecture in place.
The first step is to identify the sources and consumers of the master data. It should not be too difficult to identify applications and data stores that hold data related to master business entities. The harder challenge is to identify which of them should be the "owner" of the data, or the system of record (SOR). Typically, there will be multiple SORs that own different subsets of the overall data of a master entity. Often we assign ownership to the system that will be making the most changes to the data. For example, a CRM system might be the SOR for base customer information (name, contact, relationships, etc.), whereas customer information that is specific to a particular product or service (checking account, insurance policy, etc.) would be owned by the application that managed that product.
You can immediately see the challenge. How does MDM coordinate and synchronize the data that is overlapping between all the different systems (such as name and contact information)? Before trying to address that on a piecemeal basis, it is best to define the overall master data model. This is the model that defines the details of the master entities that should already have been identified in the enterprise information model. The master data model should be a subset of the overall enterprise model but will typically contain more detail at the attribute level. As we identify the applications that will participate in the MDM, we refine the master model to include the important attributes that are needed for the master entities. Some applications will cause new attributes to be added into the model while others may not. In addition to the attributes, we define metadata that provides traceability between the MDM and the heritage of each data element. At the same time, we identify the metadata associated with each attribute in each different data store.
A data-profiling tool can be of great assistance in collecting and analyzing the metadata and can identify some areas of difference between the many data stores. Armed with this information, we can begin to address the rationalization and synchronization. This involves defining a variety of business rules, such as the following:
How data is transformed between the SOR and the master
How data is transformed between the master and each copy
How to arbitrate when different sources have different values of the data
How to recognize and combine synonyms
How to synchronize data across all the copies when the master changes
Which systems are permitted to change the data
Oops ... did I say that data could be changed by more than one system? Unfortunately, this is unavoidable because versions of the data reside in many existing systems, and those systems have functions and interfaces that can change the data. So we also have to have ways to synchronize the change of data between those systems and the SOR. As you can see, an MDM system can become rather complex. Over time, we'll want to manage that complexity by limiting the number of systems that can change the data, through changes to processes and to the systems themselves.
Next, we turn to the data flow portion of our enterprise information architecture to understand how we will move the data from the source to the master. This will include implementing the business rules for transformation and cleansing. This constitutes the ETL (extract, transform, and load) that will be required for the MDM. We find that it's best to implement a portion of the data, test it, identify the discrepancies and other issues, develop and implement a strategy for cleaning them up, refine the model as necessary, and repeat until all the issues have been resolved and all the data sources have been integrated. This can take some time (often a lot of time), because while a profiling tool helps, it won't identify all the issues. So another best practice is to roll out an initial version of MDM for the first solution and demonstrate some value, then add additional source and additional solutions iteratively. A good versioning strategy will go a long way in supporting this incremental approach.
After going through all this effort, it is important to define governance processes and stewardship responsibilities to keep the data clean and up to date. Perhaps a topic for a future Advisor? I encourage you to send your insights to me at comments@cutter.com.
Sincerely,
Mike Rosen, Director
Enterprise Architecture Practice

