0
votes

I am using following CSV load Cypher statement to import csv file with about 3.5m records. But it only imports about 3.2m. So about 300000 records are not imported.

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM ("file:///path/to/csvfile.csv") as line
CREATE (ticket:Ticket {id: line.transaction_hash, from_stop: toInt(line.from_stop), to_stop: toInt(line.to_stop), ride_id: toInt(line.ride_id), price: toFloat(line.price)})
MATCH (from_stop:Stop)-[r:RELATES]->(to_stop:Stop) WHERE toInt(line.route_id) in r.routes
CREATE (from_stop)-[:CONNECTS {ticket_id: ID(ticket)}]->(to_stop)

Note that Stop nodes are already created in separate import statement.

When I only created Nodes without creating relationships it was able to import all data. This same import statement works fine with smaller set of same format csv data.

I tried twice just to make sure it wasn't terminated accidentally.

Is there node to relationship limit in Neo4J? Or what could be other reason?

Neo4J version: 3.0.3 size of database directory is 5.31 GiB.

1

1 Answers

1
votes

This is probably because whenever the MATCH does not succeed for a line, the entire query for that line (including the first CREATE) also fails.

On the other hand, the failure of an OPTIONAL MATCH would not abort the entire query for a line. Try this:

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM ("file:///path/to/csvfile.csv") as line
CREATE (ticket:Ticket {id: line.transaction_hash, from_stop: toInt(line.from_stop), to_stop: toInt(line.to_stop), ride_id: toInt(line.ride_id), price: toFloat(line.price)})
OPTIONAL MATCH (from:Stop)-[r:RELATES]->(to:Stop)
WHERE toInt(line.route_id) in r.routes
FOREACH(x IN CASE WHEN from IS NULL THEN NULL ELSE [1] END |
  CREATE (from)-[:CONNECTS {ticket_id: ID(ticket)}]->(to)
);

The FOREACH clause uses a somewhat roundabout technique to only CREATE the relationship if the OPTIONAL MATCH succeeded for a line.