2
votes

I have a requirement in which I need to import data from excel (CSV) to Dynamics CRM regularly.

Instead of using some simple Data Duplication Rules, I need to implement a point system to determine whether a data is considered duplicate or not.

Let me give an example. For example these are the particular rules for Import:

  1. First Name, exact match, 10 pts
  2. Last Name, exact match, 15 pts
  3. Email, exact match, 20 pts
  4. Mobile Phone, exact match, 5 pts

And then the Threshold value => 19 pts

Now, if a record have First Name and Last Name matched with an old record in the entity, the points will be 25 pts, which is higher than the threshold (19 pts), therefore the data is considered as Duplicate

If, for example, the particular record only have same First Name and Mobile Phone, the points will be 15 pts, which is lower than the threshold and thus considered as Non-Duplicate

What is the best approach to achieve this requirement? Is it possible to utilize the default functionality of Import Data in the MS CRM? Is there any 3rd party Add-on that answer my requirement above?

Thank you for all the help.

Updated

Hi Konrad, thank you for your suggestions, let me elaborate here:

Excel. You could filter out the data using Excel and then, once you've obtained a unique list, import it.

Nice one but I don't think it is really workable in my case, the data will be coming regularly from client in moderate numbers (hundreds to thousands). Typically client won't check about the duplication on the data.

Workflow. Run a process removing any instance calculated as a duplicate.

Workflow is a good idea, however since it is being processed asynchronously, my concern is the user in some cases may already do some update/changes to the data inserted, before the workflow finish working.. therefore creating some data inconsistency or at the very least confusing user experience

Plugin. On every creation of a new record, you'd check if it's to be regarded as duplicate-ish and cancel it's creation (or mark for removal).

I like this approach. So I just import like usual (for example, to contact entity), but I already have a plugin in place that getting triggered every time a record is created, the plugin will check whether the record is duplicat-ish or not and took necessary action.

2
OK, if we're talking 10^6 records, you will need to consider the performance issue. You can set the plugin as async (but that can create other issues). Also, after consulting with my colleague I realize that my suggestion, while 100% correct, is very generic. I assumed that you'd have more fields and a more sophisticated definition of unique-ish-ness. But if you're entirely sure that you'll keep this particular rule set, you might want to go with the pragmatic approach of three duplication detection rules as suggested by Rickard. WHat's your budget for this case? Any? - Konrad Viltersten

2 Answers

1
votes

I can think of the following approaches to the task (depending on the number of records, repetitiveness of the import, automatization requirement etc.) they may be all good somehow. Would you care to elaborate on the current conditions?

  1. Excel. You could filter out the data using Excel and then, once you've obtained a unique list, import it.

  2. Plugin. On every creation of a new record, you'd check if it's to be regarded as duplicate-ish and cancel it's creation (or mark for removal).

  3. Workflow. Run a process removing any instance calculated as a duplicate.

You also need to consider the implication of such elimination of data. There's a mathematical issue. Suppose that the uniqueness' radius (i.e. the threshold in this 1D case) is 3. Consider the following set of numbers (it's listed twice, just in different order).

1 3 5 7    ->    1 _ 5 _
3 1 5 7    ->    _ 3 _ 7

Are you sure that's the intended result? Under some circumstances, you can even end up with sets of records of different sizes (only depending on the order). I'm a bit curious on why and how the setup came up.

Personally, I'd go with plugin, if the above is OK by you. If you need to make sure that some of the unique-ish elements never get omitted, you'd probably best of applying a test algorithm to a backup of the data. However, that may defeat it's purpose.

In fact, it sounds so interesting that I might create the solution for you (just to show it can be done) and blog about it. What's the dead-line?

2
votes

I haven't been fiddling a lot with duplicate detection but looking at your criteria you might be able to make rules that match those, pretty much three rules to cover your cases, full name match, last name and mobile phone match and email match.

If you want to do the points system I haven't seen any out of the box components that solve this, however CRM Extensions have a product called Import Manager that might have that kind of duplicate detection. They claim to have customized duplicate checking. Might be worth asking them about this.

Otherwise it's custom coding that will solve this problem.