1
votes

I’m trying to import 20 million nodes and 250 million relationships to Neo4j using the Batch-importer . I have 8GB of RAM.

Here are my current settings in batch.properties:

use_memory_mapped_buffers=false
neostore.nodestore.db.mapped_memory=500M
neostore.relationshipstore.db.mapped_memory=2000M
neostore.relationshipgroupstore.db.mapped_memory=10M
neostore.propertystore.db.mapped_memory=1G
neostore.propertystore.db.strings.mapped_memory=200M
neostore.propertystore.db.arrays.mapped_memory=0M
neostore.propertystore.db.index.keys.mapped_memory=15M
neostore.propertystore.db.index.mapped_memory=50M
batch_import.node_index.users=exact
batch_import.csv.quotes=false
cache_type=none

It took around 7 minutes to import the 20 million nodes.

It seems that for the relationships, it takes 13 minutes to import 10 million (as per the output on the console).

Meaning it will take around 6 hours (250 / 10 * 13) to import all the relationships. Can we improve this?

1

1 Answers

0
votes

You should try the new import tool that comes with Neo4j 2.2.0-M03

It uses less memory, scales much better across CPUs.

If you want to go with my batch-importer:

Usually it imports 1M nodes /s and about 100k to 500k rels / second.

How much heap do you use?

  • use a faster disk
  • use more RAM
  • the index adds additional overhead (just for testing, try to run it without the index)
  • use Linux if you don't yet, if you do check that the disk scheduler is noop or deadline no cfq
  • try use_memory_mapped_buffers=true