I'm using neo4j-import command line to load large csv files into neo4j. I've tested the command line with subset of the data and it works well. The size of csv file is about 200G, containing ~10M nodes and ~B relationships. Currently, I'm using default neo4j configuration and it takes hours to create nodes, and it got stuck at [*SORT:20.89 GB-------------------------------------------------------------------------------] 0
I'm worried that it will take even longer time to create relationships. Thus, I would like to know possible ways to speedup data import.
It's a 16GB machine, and the neo4j-import output message shows the following.
free machine memory: 166.94 MB Max heap memory : 3.48 GB
Should I change neo4j configuration to increase memory? Will it help?I'm setting neo4j-import --processes=8. However, the CPU usages of the JAVA command is only about ~1%. Does it look right?
Can someone give me a ballpark number of loading time, given the size of my dataset? It's a 8-core, 16GB memory standalone machine.
Anything else I should look at to speedup the data import?
Updated:
The machine does not have SSD disk
I run
top
command, and it shows that 85% of RAM is being used by the JAVA process, which I think belongs to the neo4j-import command.The import command is:
neo4j-import --into /var/lib/neo4j/data/graph.db/ --nodes:Post Posts_Header.csv,posts.csv --nodes:User User_Header.csv,likes.csv --relationships:LIKES Likes_Header.csv,likes.csv --skip-duplicate-nodes true --bad-tolerance 100000000 --processors 8
4.Posts_Header:Post_ID:ID(Post),Message:string,Created_Time:string,Num_Of_Shares:int,e:IGNORE, f:IGNORE
User_Header:a:IGNORE,User_Name:string,User_ID:ID(User)
Likes_Header: :END_ID(Post),b:IGNORE,:START_ID(User)
I ran the sample data import and it's pretty fast, like several seconds. Since I use the default neo4j heap setting and default Java memory setting, will it help if I configure these numbers?
:ID
field in your csv files? Also can you grab a thread dump when it gets to the point where it stops? Thanks in advance. – Mattias Finné