0
votes

I'm loading relationships into my graph db in Neo4j using the load csv operation. The nodes are already created. I have four different types of relationships to create from four different CSV files (file 1 - 59 relationships, file 2 - 905 relationships, file 3 - 173,000 relationships, file 4 - over 1 million relationships). The cypher queries execute just fine, However file 1 (59 relationships) takes 25 seconds to execute, file 2 took 6.98 minutes and file 3 is still going on since past 2 hours. I'm not sure if these execution times are normal given neo4j's capabilities to handle millions of relationships. A sample cypher query I'm using is given below.

load csv with headers from
"file:/sample.csv"
as rels3
match (a:Index1 {Filename: rels3.Filename})
match (b:Index2 {Field_name: rels3.Field_name})
create (a)-[:relation1 {type: rels3.`relation1`}]->(b)
return a, b

'a' and 'b' are two indices I created for two of the preloaded node categories hoping to speed up lookup operation.

Additional information - Number of nodes (a category) - 1791 Number of nodes (b category) - 3341

Is there a faster way to load this and does load csv operation take so much time? Am i going wrong somewhere?

2

2 Answers

1
votes

Create an index on Index1.Filename and Index2.Field_name:

CREATE INDEX ON :Index1(Filename);
CREATE INDEX ON :Index2(Field_name);

Verify these indexes are online:

:schema

Verify your query is using the indexes by adding PROFILE to the start of your query and looking at the execution plan to see if the indexes are being used.

More info here

1
votes

What i like to do before running a query is run explain first to see if there are any warnings. I have fixed many a query thanks to the warnings.
(simple pre-append explain to your query)

Also, perhaps you can drop the return statement. After your query finishes you can then run another to just see the nodes.

I create roughly 20M relationships in about 54 mins using a query very similar to yours.

Indices are important because that's how neo finds the nodes.