0
votes

I loaded data (2.8 M) records using the LOAD CSV WITH HEADERS

When I try to create relationships using the following script in Cypher, I get the following error -Neo.DatabaseError.General.UnknownError - Java heap space I also created an index to speed up the relationship process.

CREATE INDEX ON :Entity(ENT_ID)

PROFILE
MATCH(Entity)
MATCH (a:Entity {ENT_ID : Entity.ENT_ID})
WITH Entity, a
MATCH (b:Entity {ENT_ID : Entity.PARENTID})
WITH a,b
MERGE (a)-[r:RELATION]->(b)
RETURN r

I have already gone through the other links who had faced the same issue, but did not find the solution to the same. Hence I am reposting this question. My dataset has 2 columns ENT_ID and PARENTID. I am trying to create the relationship using the above query.

I have no background knowledge of Java or Java Virtual Machine. Just based on other links, I thought I would have the error would get wiped out by using the below properties:

neo4j.conf:

dbms.memory.pagecache.size=3g

-- Initial Java Heap Size (in MB) :

wrapper.java.initmemory=1024

-- Maximum Java Heap Size (in MB) :

wrapper.java.maxmemory=16000

-- other beneficial settings that should boost performance :

wrapper.java.additional.6=-d64
wrapper.java.additional.7=-server
wrapper.java.additional.8=-Xss1024k

JAVA VM Tuning:

 -Xmx4000M
 -Xms4000M 
 -Xmn1000M

I'd appreciate any help

EDIT:

Based on the comment below: I used the following Query: It still throws me the same Java Heap Space:

PROFILE
MATCH (a:Entity)
WHERE a.PARENTID IS NOT NULL
WITH a
MATCH (b:Entity {ENT_ID : a.PARENTID})
MERGE (a)-[r:RELATION]->(b)

Kindly help.

Thanks

1
issue is in your CYPHER query because MATCH(Entity) also match all other nodes attached with Entity. Replace it with MATCH (en:Entity) WITH en right now in your case it is making cartesian product and Neo4j server become unresponsive.Usman Maqbool

1 Answers

1
votes

As Usman commented, the MATCH(Entity) line is useless, and worse, it's matching against every node in your db, not just :Entity nodes, and causing a cartesian product.

I think what you want is something like this, processing all :Entity with a parentid, and making a RELATION from them to their parent:

PROFILE
MATCH (a:Entity)
WHERE a.PARENTID IS NOT NULL
WITH a
MATCH (b:Entity {ENT_ID : a.PARENTID})
MERGE (a)-[r:RELATION]->(b)

This should avoid the cartesian product, and be complexity n where n is the number of :Entity nodes.

I removed the RETURN as you probably do not want to return millions of relationships. Afterwards, if you want a count of those relationships, though, you can use:

MATCH (:Entity)-[r:Relation]->(:Entity)
RETURN COUNT(r)