1
votes

I am using the following commands to load data from a csv file into Neo4j. The input file is large and there are millions of rows. While this query is running I can query for the number of nodes and check the progress. But once it stops creating nodes, I guess it moves on to creating relations. But I am not able to check the progress of this step.

I have two doubts:

  1. Does it process the command for each line of file, i.e. create the nodes and relations etc. for each source line??
  2. Or it creates all the nodes in one shot and then creates the relations.

Anyways I want to monitor the progress of the following command. It seems to get stuck after creating the nodes and when I try to query for number of relations I get 0 as output.

I created a constraint on the key attribute.

CREATE CONSTRAINT ON (n:Node) ASSERT n.key is UNIQUE;

Here is the cypher that loads the file.

USING PERIODIC COMMIT
LOAD CSV FROM "file:///data/abc.csv" AS row
MERGE (u:Node {name:row[1],type:row[2],key:row[1]+"*"+row[2]})
MERGE (v:Node {name:row[4],type:row[5], key:row[4]+"*"+row[5]})
CREATE (u) - [r:relatedTo]-> (v)
SET r.type = row[3], r.frequency=toint(trim(row[6]));
1

1 Answers

3
votes

For every row of your CSV file, Neo4j is doing the cypher script, ie. :

MERGE (u:Node {name:row[1],type:row[2],key:row[1]+"*"+row[2]})
MERGE (v:Node {name:row[4],type:row[5], key:row[4]+"*"+row[5]})
CREATE (u) - [r:relatedTo]-> (v)
 SET r.type = row[3], r.frequency=toint(trim(row[6]))

Due to using periodic commit, every 500 lines (the default value), a commit is done.

You can only see changes in your graph, when Neo4j have finished to parse 500 lines.

But your script is not optimized, you are not using the constraint with the merge.

You should consider this script instead:

USING PERIODIC COMMIT
LOAD CSV FROM "file:///data/abc.csv" AS row

MERGE (u:Node {key:row[1]+"*"+row[2]})
  ON CREATE SET u.name = row[1],
                u.type = row[2]

MERGE (v:Node {key:row[4]+"*"+row[5]})
  ON CREATE SET v.name = row[4],
                v.type = row[5]

CREATE (u)-[r:relatedTo]->(v)
 SET r.type = row[3], r.frequency=toint(trim(row[6]));

Cheers