3
votes

I have a large graph in which there are nodes representing people. All of them have firstname and surname properties, some have middlename properties. I'm looking for nodes that might represent the same person, so am looking at the different permutations of names. I'm currently comparing surnames and the first initial of firstnames [ some nodes just have initials ], but can't figure out how to test middlenames if they exist.

My current query is:

match (a:Author), (b:Author)
where
  a.surname=b.surname and
  ( a.firstname starts with 'A' and b.firstname starts with 'A')
return distinct a,b

My understanding is that OPTIONAL MATCH refers only to patterns, so that won't work. I can't find a way to write an if statement that makes sense.

It may be that it makes more sense for me to do this programmatically, rather than relying just on direct Cypher queries, but I was hoping to keep it really simple and just do it in Cypher.

Some examples to clarify what I want to do.

Example 1:

 Node 1:  firstname "John" middlename "Patrick" lastname "Smith" 
    Node 2: firstname "J" middlename "P" lastname "Smith" 
    Node 3: firstname "J" middlename "Q" lastname "Smith" 
    Node 4: firstname "J" lastname "Smith"

I want a query that will return nodes 1, 2, and 4 as 'matching'.

Example 2:

Node 1:  firstname "Jane" lastname "Smith" 
Node 2: firstname "J" middlename "P" lastname "Smith" 
Node 3: firstname "J" middlename "Q" lastname "Smith" 
Node 4: firstname "J" lastname "Smith"

Here, I want all 4 nodes, since the 'canonical' name doesn't have a middle name.

1
You might want to more clearly elucidate what it is you're trying to do since "can't figure out how to test middle names if they exist" resulted in me telling you to use EXISTS and get down-voted for it.joslinm
Thanks, hopefully my expanded examples clarify better. FWIW, I didn't downvote you. To me, your comment was a sign that I didn't clearly explain the problem I was trying to solve.betseyb

1 Answers

3
votes

I think you need something like the following:

match (a:Author), (b:Author)
where
  id(a) < id(b) and
  ( a.surname=b.surname) and
  ( a.firstname starts with 'A' and b.firstname starts with 'A') and
  ( a.middlename=b.middlename OR a.middlename IS NULL OR b.middlename IS NULL)
return a,b

How to work with null is a good reference for puzzles like the one you're dealing with.

EDIT: Let's break it down with some pseudocode:

if (a.middlename is null) return true;
if (b.middlename is null) return true;
if (a.middlename is not null and b.middlename is not null and a.middlename!=b.middlename) return false;
if (a.middlename is not null and b.middlename is not null and a.middlename=b.middlename) return true;