4
votes

I have some duplicate nodes, all with the label Tag. What I mean with duplicates is that I have two nodes with the same name property, example:

{ name: writing, _id: 57ec2289a90f9a2deece7e6d},
{ name: writing, _id: 57db1da737f2564f1d5fc5a1},
{ name: writing }

The _id field is no longer used so in all effects these three nodes are the same, only that each of them have different relationships.

What I would like to do is:

  1. Find all duplicate nodes (check)

    MATCH (n:Tag)
    WITH n.name AS name, COLLECT(n) AS nodelist, COUNT(*) AS count
    WHERE count > 1
    RETURN name, nodelist, count
    
  2. Copy all relationships from the duplicate nodes into the first one

  3. Delete all the duplicate nodes

Can this be achieved with cypher query? Or do I have to make a script in some programming language? (this is what I'm trying to avoid)

1

1 Answers

15
votes

APOC Procedures has some graph refactoring procedures that can help. I think apoc.refactor.mergeNodes() ought to do the trick.

Be aware that in addition to transferring all relationships from the other nodes onto the first node of the list, it will also apply any labels and properties from the other nodes onto the first node. If that's not something you want to do, then you may have to collect incoming and outgoing relationships from the other nodes and use apoc.refactor.to() and apoc.refactor.from() instead.

Here's the query for merging nodes:

MATCH (n:Tag)
WITH n.name AS name, COLLECT(n) AS nodelist, COUNT(*) AS count
WHERE count > 1
CALL apoc.refactor.mergeNodes(nodelist) YIELD node
RETURN node