2
votes

I made some wrong moves in Neo4j, and now we have a graph with duplicate nodes. Among the duplicate pairs, the full property set belongs to the first of the pair, and the relationships all belong to the second in the pair. The index is the node_auto_index.

Nodes:

Id  Name Age  From       Profession

1  Bob  23   Canada       Doctor
2  Amy  45   Switzerland  Lawyer
3  Sam  09   US  
4  Bob   
5  Amy
6  Sam

Relationships:

Id  Start  End   Type
1     4     6     Family
2     5     6     Family
3     4     5     Divorced

I am trying to avoid redoing the whole batch import. Is there a way to merge the nodes in cypher based on the "name" string property, while keeping all of the properties and the relationships?

Thank you!

2
I would just re-do the batch-import it is definitely faster :)Michael Hunger
Okay, I ended up doing that. For reference, the query I tried was still running two days later.Olga Mu

2 Answers

1
votes

Okay, I think I figured it out:

START first=node(*), second=node(*) 
WHERE has(first.Name) and has(second.Name) and has(second.Age) and NOT(has(first.Age))
WITH first, second
WHERE first.Name= second.Name
SET first=second

The query is still processing, but is there a more efficient way of doing this?

1
votes

You create a cross product here between the two sets, so that will be expensive. Better is to do an index lookup for name.

START first=node(*), second=node(*) 
WHERE has(first.Name) and has(second.Name) and has(second.Age) and NOT(has(first.Age))
WITH first, second
SKIP 20000 LIMIT 20000
WHERE first.Name= second.Name
SET first=second

And you probably have to paginate the processing as well.

START n=node:node_auto_index("Name:*")
WITH n.Name, collect(n) nodes
SKIP 20000 LIMIT 20000
WHERE length(nodes) == 2
WITH head(filter(x in nodes : not(has(x.Age)))) as first, head(filter(x in nodes : has(x.Age))) as second
SET first=second