0
votes

I am a newbie to neo4j. I have done some research before posting this question here, but couldn't find a solution that works for me. I have a ubuntu VM with 32GB RAM. I am currently using neo4j-community edition 4.0.2 and got below configurations in my neo4j.conf file.

dbms.memory.heap.initial_size=5120m
dbms.memory.heap.max_size=5120m
dbms.memory.pagecache.size=10g

I am trying to import CSV files into the default neo4j database using the cypher-shell. It works well for small files, but I have a problem with csv files of size > 1GB. It fails with error below

Can't allocate 524288 bytes due to exceeding memory limit; used=2147266560, max=2147483648

I tried setting the JAVA_OPTS=-Xmx4G to increase the java heap space but it still fails with the same error. Can someone please assist? Is the cypher-shell utility looking for more heap space or is it the neo4j database looking for additional heap space? If cypher-shell cannot be used to import huge files what are other options for me to import huge files?

edit: I am using a periodic commit for every 200 rows and I have the below cypher script in a .cypher file and using the cypher-shell to run the .cypher file

CREATE INDEX ON:Review (nr);

USING PERIODIC COMMIT 200 LOAD CSV WITH HEADERS FROM "file:///12Review.csv" AS row 
MERGE (R:Review {nr: toInteger(row.nr)}) 
WITH row, R
FOREACH(ignoreMe in CASE WHEN row.reviewDate IS NULL THEN [] ELSE [1] END| SET R.reviewDate =row.reviewDate)
FOREACH(ignoreMe in CASE WHEN row.title IS NULL THEN [] ELSE [1] END| SET R.title = row.title)
FOREACH(ignoreMe in CASE WHEN row.rating1 IS NULL THEN [] ELSE [1] END| SET R.rating1 =row.rating1) 
FOREACH(ignoreMe in CASE WHEN row.rating2 IS NULL THEN [] ELSE [1] END| SET R.rating2 =row.rating2) 
FOREACH(ignoreMe in CASE WHEN row.rating3 IS NULL THEN [] ELSE [1] END| SET R.rating3 =row.rating3) 
FOREACH(ignoreMe in CASE WHEN row.rating4 IS NULL THEN [] ELSE [1] END| SET R.rating4 =row.rating4)
1
Hello, we really need more details as to how you're doing the import. LOAD CSV? If so, are you using PERIODIC COMMIT? What's the import query? Have you done an EXPLAIN of the query to determine if appropriate indexes have been created and are being used?InverseFalcon
@InverseFalcon updated my question to add the information you requestedachyuthgurram

1 Answers

0
votes

I'm running an EXPLAIN on this and seeing an Eager operator in the query plan, that basically cancels out the periodic commit and causes all results to manifest in memory, that's leading to the out of heap operation. The FOREACHes here are causing this, you won't be able to use this approach when using periodic commit.

Instead try using coalesce() to use the first non-null value, first attempting to get the row value, then use the node value:

USING PERIODIC COMMIT 10000 LOAD CSV WITH HEADERS FROM "file:///12Review.csv" AS row 
MERGE (R:Review {nr: toInteger(row.nr)}) 
SET R.reviewDate = coalesce(row.reviewDate, R.reviewDate),
R.reviewDate = coalesce(row.title, R.title),
R.reviewDate = coalesce(row.rating1, R.rating1),
R.reviewDate = coalesce(row.rating2, R.rating2),
R.reviewDate = coalesce(row.rating3, R.rating3),
R.reviewDate = coalesce(row.rating4, R.rating4)