Say I have a csv file containing node information, each line with a unique id (the first column), and another csv file containing the edges, describing edges between the nodes (via their unique ID's). The following cypher code successfully loads the nodes and then creates the edges. However, can I make it more efficient? My real data set has millions of nodes and tens of millions of edges. Obviously I should use periodic commits and create an index, but can I somehow avoid match
ing for every single edge and use the fact that I know of the unique node ids for each edge I want to build? Or am I going about this all wrong? I would like to do this entirely in cypher (no java).
load csv from 'file:///home/user/nodes.txt' as line
create (:foo { id: toInt(line[0]), name: line[1], someprop: line[2]});
load csv from 'file:///home/user/edges.txt' as line
match (n1:foo { id: toInt(line[0])} )
with n1, line
match (n2:foo { id: toInt(line[1])} )
// if I had an index I'd use it here with: using index n2:foo(name)
merge (n1) -[:bar]-> (n2) ;
match p = (n)-->(m) return p;
nodes.txt
:
0,node0,Some Property 0
1,node1,Some Property 1
2,node2,Some Property 2
3,node3,Some Property 3
4,node4,Some Property 4
5,node5,Some Property 5
6,node6,Some Property 6
7,node7,Some Property 7
8,node8,Some Property 8
9,node9,Some Property 9
10,node10,Some Property 10
...
edges.txt
:
0,2
0,4
0,8
0,13
1,4
1,8
1,15
2,4
2,6
3,4
3,7
3,8
3,11
4,10
...