Compare duplicates for 4 fields in open SQL

Question

I want to compare if there are duplicates across 4 fields in open SQL.

Scenario: User has 4 fields to input. First name (N1), last name (N2), additional first name (N3) and additional last name (N4).

Right now the algorithm works this way: It concatenates N1 + N2 + % and then also N2+ N1 + %. So if the user inputs in any of the fields, the query looks for N1N2% or N2N1%. This mean for 2 fields, there are 2! combinations possible. Now with 2 additional fields, this algorithm explodes as there will be 4! combinations to check. Any ideas how to tackle this?

Note: We do this kind of combination check because the user could input data in any of those given input field. So we check for all combination of fields. Unfortunately, this cannot be changed.

EDIT: I cannot assume the order as it was previously designed in such a way. Hence, the complications with combinations.

Edit2: I like the idea of checking individual parts. But what we want to do is ideally concatenate all strings together and check for a substring in DB. In open-sql its done using the like statement. Our DB table has such concatenated string already stored for N1+N2 combination. This needs to be extended for 4 fields now.

Your question is very unclear. Are you trying to find a user in the database based upon the user details entered in some form? Can't you assume even the correct order of the fields? — lilalinux
Why does it have to be done with an OpenSQL query and why can't you use regular expressions for that purpose? — Jagger
Try %N1% AND %N2% AND %N3% AND %N4% AND (N1% OR N2% OR N3% OR N4%) and check that the total size is exactly the sum of all parts — lilalinux
Do you really want the db entry the start with an arbitrary order of the name parts followed by potentially any other text? — lilalinux

lilalinux lilalinux · Accepted Answer · 2016-06-01T15:11:09

The key to your problem is checking all name parts individually with leading and trailing '%' and check the total size of the db entry against the sum of the name parts:

field = ('%' + N1 + '%') AND field = ('%' + N2 + '%') AND field = ('%' + N3 + '%') AND field = ('%' + N4 + '%') AND LENGTH(field) = LENGTH(N1+N2+N3+N4)

This will find a match. You could use it to SELECT a normalized concatenation of the names and use GROUP BY and HAVING count(*)>1 to search for duplicates.

Compare duplicates for 4 fields in open SQL

2 Answers