0
votes

Say I have a csv file containing node information, each line with a unique id (the first column), and another csv file containing the edges, describing edges between the nodes (via their unique ID's). The following cypher code successfully loads the nodes and then creates the edges. However, can I make it more efficient? My real data set has millions of nodes and tens of millions of edges. Obviously I should use periodic commits and create an index, but can I somehow avoid matching for every single edge and use the fact that I know of the unique node ids for each edge I want to build? Or am I going about this all wrong? I would like to do this entirely in cypher (no java).

load csv from 'file:///home/user/nodes.txt' as line
create (:foo { id: toInt(line[0]), name: line[1], someprop: line[2]});

load csv from 'file:///home/user/edges.txt' as line
match (n1:foo { id: toInt(line[0])} ) 
with n1, line
match (n2:foo { id: toInt(line[1])} ) 
// if I had an index I'd use it here with: using index n2:foo(name) 
merge (n1) -[:bar]-> (n2) ;

match p = (n)-->(m) return p;

nodes.txt:

0,node0,Some Property 0
1,node1,Some Property 1
2,node2,Some Property 2
3,node3,Some Property 3
4,node4,Some Property 4
5,node5,Some Property 5
6,node6,Some Property 6
7,node7,Some Property 7
8,node8,Some Property 8
9,node9,Some Property 9
10,node10,Some Property 10
...

edges.txt:

0,2
0,4
0,8
0,13
1,4
1,8
1,15
2,4
2,6
3,4
3,7
3,8
3,11
4,10
...
1
The load csv is not the way to go if you have that many data. You can take a look at the tool Michael Hunger made: github.com/jexp/batch-importRon van Weverwijk

1 Answers

0
votes

Like Ron commented above, LOAD CSV is likely not the way to go for large datasets, and the csv Batch Import tool he links to is great. If you find you cannot wedge a csv easily in a way that works with the Batch Import tool, then the Neo4J BatchInserter API is very simply to use: http://docs.neo4j.org/chunked/stable/batchinsert.html