1
votes

I have used the import tool to read in ~1 million nodes. Now it is time to set relationships. (Unfortunately, it looks like you have to have relationships predetermined explicitly in a csv if you want to use the import tool, so that is out of the question.)

First thing I did was to put an index on the nodes.

Next, I wrote this, which I'm wondering is my problem -- even with an index, this statement might cause too many cartesian products?:

USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM
'file:///home/monica/...relationship.csv' AS line
MATCH (p1:Player {player_id: line.player1_id}),  (p2:Player {player_id: line.player2_id})
MERGE (p1)-[:VERSUS]-(p2)  

Apparently the USING PERIODIC COMMIT 500 didn't help, as I got my error,

Java heap space  

Googling around, I learned that it might help to change my memory settings in the neo4j-wrapper.conf file, so I changed the settings all the way up to 4GB (I have an eight GB system):

wrapper.java.initmemory=4096
wrapper.java.maxmemory=4096  

Still got the same error.

Now, I'm stuck. I can't think of any other strategies, besides:

1) rewrite the statement
2) use a system with more RAM?
3) find some other way to run this in batches?

Any advice would be awesome. Thanks to the neo4j SO community in advance.

1

1 Answers

1
votes

Do you have an index or an unique constraint on :Player(player_id) ? if the former, drop the index and add an unique constraint instead. Otherwise it is possible to have multiple Player nodes sharing the same player_id - which could cause cartesian products, assume you have 10 times the very same player, this would end up in 100 combinations for each line of your csv.

Once you're sure there is no such duplication the next thing to check is EagerPipe. If the query plan (without PERIODIC COMMIT)

EXPLAIN LOAD CSV WITH HEADERS FROM
'file:///home/monica/...relationship.csv' AS line
MATCH (b1:Player {player_id: line.player1_id}),  (p2:Player {player_id:     line.player2_id})
MERGE (p1)-[:VERSUS]-(p2)  

shows something with eager then PERIODIC COMMIT is not applied, see http://www.markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/ for details.

The cases where this could happen gets less and less with a more recent Neo4j version.

update

I've just realized that you're using b1 in the match and in the merge a p1 - so the latter does not exist and gets created as new node during merge.

Can you please try:

USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM
'file:///home/monica/...relationship.csv' AS line
MATCH (p1:Player {player_id: line.player1_id})
MATCH (p2:Player {player_id: line.player2_id})
MERGE (p1)-[:VERSUS]-(p2)