1
votes

I have a very large (~24 million lines) edge list that I'm trying to import into a Neo4j graph that is populated with nodes. The CSV file has three columns: from, to, and the period (relationship property). I've tried this using the REST API using the following (Python) code:

batch_queue.append({"method":"POST","to":'index/node/people?uniqueness=get_or_create','id':1,'body':{'key':'name','value':row[0]}})
batch_queue.append({"method":"POST","to":'index/node/people?uniqueness=get_or_create','id':2,'body':{'key':'name','value':row[1]}})
batch_queue.append({"method":"POST","to":'{1}/relationships','body':{'to':"{2}","type":"FP%s" % row[2]}})

Where the third line failed, and then also using the Cypher statement:

USING PERIODIC COMMIT
LOAD CSV FROM "file:///file-name.csv" AS line
MATCH (a:Person {name: line[0]}),(b:Person {name:line[1]})
CREATE (a)-[:FOLLOWS {period: line[2]}]->(b)

Which worked in small scale but gave me an "Unknown Error" when using the whole list (also with smaller periodic commit values).

Any guidance as to what I'm doing incorrectly would be appreciated.

1
What error are you getting from the third line of the first block?snorthway
It's returning error 500/Server ErrorZach Sheffler

1 Answers

2
votes

You might want to look into my batch-importer for that: http://github.com/jexp/batch-import

Otherwise for LOAD CSV, see my blog post here: http://jexp.de/blog/2014/06/load-csv-into-neo4j-quickly-and-successfully/

use the neo4j-shell for LOAD CSV

Depending on your memory available, you might have to split the data a bit. By moving a window over the file (e.g. 1M rows at once below). Do you have indexes / constraints created for :Person(name) ?

USING PERIODIC COMMIT
LOAD CSV FROM "file:///file-name.csv" AS line
WITH line
SKIP 2000000 LIMIT 1000000
MATCH (a:Person {name: line[0]}),(b:Person {name:line[1]})
CREATE (a)-[:FOLLOWS {period: line[2]}]->(b)