0
votes

I need to create relationships between all the nodes, which have the same property values.

For example I can use the following query:

match (p1:Person), (p2:Person)
where p1 <> p2 and p1.someproperty = p2.someproperty
merge(p1)-[r:Relationship]-(p2)
return p1,r, p2

But if I have about 200k of nodes, this script running quite long.

Are there any other faster ways to create such relationships?

Thanks

1

1 Answers

2
votes

The query you wrote first creates a cartesian product between all pairings of person nodes, then does filtering on each pairing to find the ones that are actually related, then creates the relationship. That is very expensive, an n^2 operation.

Instead, you may want to go through all Person nodes just once, and find the corresponding person node with the property, and create the relationship.

Also, you should see greatly increased performance if you have either an index or unique constraint on the property in question, otherwise it will be a node scan over all nodes in that label with each comparison, another contributing factor to the slow query.

Also, I encourage you not to return the nodes and relationship if possible, assuming that it's in the neighborhood of thousands or hundreds of thousands of results. That's probably another factor.

match (p1:Person)
with p1
match (p2:Person)
where p2.someproperty = p1.someproperty and p1 <> p2
merge(p1)-[r:Relationship]-(p2)

You should be able to EXPLAIN both this query and your old one and see how they're both going to run.