3
votes

My question is very similar to this one: How to create unique nodes and relationships by csv file imported in neo4j? I have a textfile with around 2.5 million lines that has two columns, each one being node ids:

1234 345
1234 568
345 984
... ...

Each line represents a relationship (so 2.5 million relationships): first_column nodeid-> FOLLOWS -> second_column nodeid. There are around 80,000 unique nodes in this file.

Based on the link above, I did:

USING PERIODIC COMMIT 1000
LOAD CSV FROM 'file:///home/user_name/Desktop/bigfile.csv' AS line FIELDTERMINATOR ' '
MERGE (n:Userid { id: toInt(line[0]) })
WITH line, n
MERGE (m:Userid { id: toInt(line[1]) })
WITH m,n
MERGE (n)-[:FOLLOWS]->(m)

I am assuming this code

  • creates node n or m if it doesn't exist (and finds it if it does exist), and creates a relationship from n to m.
  • If n or m exists and already has many other edges (relationships) pointing to and from other nodes, this would just add another edge from n to m (not creating a brand new node when it already exists)

My main question is I am wondering how to make this process faster. This is being done on Ubuntu, and I changed the values from 512 to 2048 MB for memory in the conf/neo4j-wrapper.conf file. (maximum I can increase on my Virtual Machine)

Should I try doing the Import tool? Based on example on this website, neo4j.com/developer/guide-import-csv/ under "Super Fast Batch Importer For Huge Datasets",

./bin/neo4j-import --into mydatabase.db --id-type INTEGER \
                   --nodes allnodes.csv \
                   --delimiter " " \
                   --relationships:FOLLOWS bigfile.csv

And to do this, I need to reformat files so that: allnodes.csv shows

userID:ID(Userid)
1234
5678
...

And bigfile.csv shows

:START_ID(Userid)   :END_ID(Userid)
1234                 345
1234                 568
345                  984
*Two columns delimited by space*

And when I run this import, I get this error:

Input error: Expected '--nodes' to have at least 1 valid item, but had 0 []
Caused by:Expected '--nodes' to have at least 1 valid item, but had 0 []
java.lang.IllegalArgumentException: Expected '--nodes' to have at least 1 valid item, but had 0 []

How do I fix this error? And for the csv files, do I put them in same folder where I run this command (neo4j folder)?

1

1 Answers

2
votes

Your command line probably has the wrong paths for your two CSV files.