0
votes

I have a Neo4J database that contain about 90K Node (Which inserted using py2neo and collected from twitter). I want to create relationship between all pair of these nodes and initialize a specific value on the relationship base on Node. I do this with py2neo in python, and try to compare all two nodes and create that relationship, but it takes too much time, for example it takes more than 2 hours to create 80K relationship.

Is there any better way to create this kind of relationships that take less time? Is py2neo slow or it is because of data volume?

thanks in advance

1

1 Answers

4
votes

You are trying to do about 8.1 billion (90K * 90K) operations. If each operation takes 1 millisecond (just for discussion -- this may be way off), then it would take almost 94 days.

So, you want to avoid "cartesian products" with complexity of O(N**2). I would suggest revisiting your use case to see how you can adjust your neo4j data model so that you can avoid having to create so many operations and relationships.

[UPDATE]

If you DO NOT really need to create a relationship between ALL node pairs, but only need need a relationship between specific node pairs, then the comment from @MichaelHunger suggests an approach with approximately O(N) complexity.

The approach requires that you first create an index; for example, if your nodes have the label Foo and the property of interest for determining whether to create a relationships is bar:

CREATE INDEX ON :Foo(bar)

Then, you can:

  1. MATCH all Foo nodes.
  2. For each matched Foo node, use the index to find all other Foo nodes that have the appropriate bar value (e.g., whose bar is double the current bar).
  3. Create a relationship between the current node and the other node.

    MATCH (n:Foo)
    MATCH (other:Foo) USING INDEX other:Foo(bar)
    WHERE other.bar = n.bar * 2
    CREATE (n)-[:RELATED_TO]->(m);
    

Step 1 has complexity O(N), and the subsequent steps have complexity O(1), resulting in an overall complexity of O(N).