Some observations:
- Your query will try to perform 2*600*599 (or 718,800)
MERGE
operations in a single transaction. The reason for the factor of 2
is because every pair of words (say, x and y) will be seen twice (as x/y and y/x). You (presumably) only want to perform half that number of operations.
- The x/y and y/x behavior also causes an attempt to ensure there are 2
DISTANCE
relationships for each word pair -- one in either direction. That is (presumably) twice the number of relationships than you want (or need).
- Trying to perform 720K (or even just 360K) operations in a single transaction may cause the DB server to run out of memory.
Here is a modified query that might fix the above issues. The ID(w1) < ID(w2)
test makes sure the 2 words in a pair are not the same AND that the same pair is only processed once. It also uses the APOC procedure apoc.periodic.iterate to create 10K relationships at a time in separate transactions, in parallel.
CALL apoc.periodic.iterate(
'MATCH (w1:Word), (w2:Word) WHERE ID(w1) < ID(w2) RETURN w1, w2',
'CREATE (w1)-[:DISTANCE]->(w2)',
{batchSize:10000, parallel:true}) YIELD batches, total
RETURN *
NOTE 1: This query assumes that you start out without any DISTANCE
relationships in the DB, so it uses the cheaper CREATE
clause instead of MERGE
. If DISTANCE
relationships exist already, then use MERGE
instead (but this could create a second relationship between the same pair if the first relationship was in the opposite direction).
NOTE 2: Performing the batches in parallel should be safe because issue #2 is not possible with the new Cypher code. If 2 transactions were to attempt to create relationships in opposite directions between the same 2 nodes at the same time, that could result in a deadlock, which would cause at least one of the transactions to fail.
NOTE 3: This query assumes that the first statement (with the MATCH
clause) does not itself run out of memory or take too long to process. If that assumption is wrong, then using a suitably modified query with apoc.periodic.commit should work.