Integrating data between systems carries with it multitudinous risks, including race conditions, user confusion, and difficulty setting expectations.  One of the most insidious and destructive risks is data homogenization.

Imagine you want to integrate Dynamics CRM with HubSpot.  You’ve worked with your LyntonWeb agent to map out all the fields in perfect detail, and you’ve built out and tested the sync successfully on a few sample sample records.  Everything seems to be working in good order and timeliness.  What could go wrong after this point?

Uniqueness

Different systems use different methods to denote what a “unique” record is, and potentially still other methods to determine when one record is a matching result for a given search.  For example, HubSpot requires that each Contact have a completely unique email address, assuming the Contact was not imported with a blank value.  This is atypical compared to many other systems, but is understandable given that HubSpot was first and foremost reliant on electronic marketing process.

This presents a curious question: what happens when another system does not follow this restriction on unique emails?  In out-of-box Dynamics CRM, there are no restrictions on email addresses for Contacts or Leads.  The only unique identifier that CRM is concerned with is its own internal “GUID” value (Global Unique Identifier), which average CRM users will never see or need to know.  There are ways to set up duplicate detection rules which can help constrain data, but most clients using Dynamics CRM never saw any reason to enforce such a restriction.

Timing

In our example then, let’s say the integration was enabled and set to operate on all modified records moving forward.  One day, someone modifies a Contact record in Dynamics that happens to have the same email address as another Contact that previously synced to HubSpot.  Assuming the integration is designed to take the data it receives from CRM on “good faith”, it will simply update that HubSpot Contact with the recently modified CRM Contact’s data.  This means that users in HubSpot may see data from one of two different CRM Contact records, depending on which one was the last one in CRM to be modified.  However, as confusing as this can be for a client, it isn’t the greatest potential danger.

Homogenization

The average integration not only copies data from one system to another, but also in the reverse direction, so that users in each system can see the latest changes and activity from users working in the other.  If the target HubSpot system is also set to copy some or all of its data back to CRM, there are many important questions to ask, such as how to set the criteria that determines when a HubSpot record would be eligible to sync back to Dynamics.  However, what will happen if that same HubSpot Contact syncs back to Dynamics, and it assumes that its email is a unique matching identifier?

The problem should be obvious.  The presence of a duplicate Dynamics record, and with both being eligible to sync to HubSpot, has resulted in some CRM data being destructively overwritten.  Furthermore, no one would have any way of predicting which Dynamics record would “win” the battle and cause its data to take precedent in a case like this, because it would all depend on when records were last modified.

Let’s complicate this issue even further by assuming that only some fields were set to copy from Dynamics to HubSpot, and then yet different fields from HubSpot to Dynamics.  We’ll pretend that only the Name values are being synced from Dynamics, and only the Phone number is synced back from HubSpot.  Let’s also assume that the HubSpot record had its Phone number manually changed inside HubSpot (or it already existed and was different than what either Dynamics record held).

As you can see, there is almost no limit to how badly things can become confused, and this may all have come about due to nothing but good intentions.  If the requirements of each system are not discussed in great detail, if the integration is not designed with care by someone experienced in these matters, or if the client is simply unaware they have duplicate data (which happens more often than you’d guess!), these issues may arise at any point after the integration has launched.

Solutions

At LyntonWeb, we often have clients who are focused on marketing, and not technically inclined or interested in many details.  However, every client is interested in keeping their data intact and reliable, so this is an easy conversation to start, and commands attention.

Some of the first questions we ask of a client for any prospective bi-directional integration are:

  1. What record types need to be copied from system A to system B, and vice versa?

  2. What are the criteria by which you want records to be eligible to sync from system A to system B, and vice versa?

  3. What fields do you want to copy in each direction for each record type?  Do any fields need special transformation to be accepted into the other system?

  4. Are you aware of any duplicate data you may have in each system, as determined by the unique requirements of both systems?

Asking these questions upfront, and analyzing the answers along with the client, often greatly reduces the chances for data homogenization and other risks before development has even started.  If duplicate data is identified, we may suggest the following solutions:

  • Assistance in identifying any duplicate system data, if the client is able and willing to clean their data before the integration is under way.  The client will also need to institute good policy to prevent their users from creating more duplicate data in future.

  • Identify additional and more restrictive criteria for sync eligibility for each type of record involved in the integration.  Ideally, this would mean that duplicate records with incomplete data would not be selected for sync.

  • Research other methods of identifying matching target records, in the hope of avoiding having multiple records sync into one.  This may or may not be possible depending on how a system allows us to interact with it programmatically.

  • Add additional logic that pre-checks the existing data in one system and compares it before depositing data into a given field.  This leads to a slower integration, but in some cases may be necessary.