multiple loads in neo4j

Question

I have loaded some data in neo4j graph database using batch importer. Now let's say if I have to load more data then do i have to keep track of what was inserted externally or there are standard features of neo4j that can be used to:

1) get the id for the last node inserted so that i know the id for the new node that needs to be inserted and index accordingly.

2) get the list of nodes already present in database so that i can check the uniqueness of the nodes that are going to be inserted. if a node already exists in the database i will just use the same id and won't create a new node.

3) check the uniqueness of the triplets - suppose a triplet "January Month is_a" is already present in neo4j database and let's say the new data that i want to insert also have same triplet, i would like to not insert it as it will give me duplicate results.

For example: if you add following data in neo4j graph database using batch-importer:https://github.com/jexp/batch-import

$ cat nodes.csv
name age works_on
Michael 37 neo4j
Selina 14
Rana 6
Selma 4

$ cat nodes_index.csv
0 name age works_on
1 Michael 37 neo4j
2 Selina 14
3 Rana 6
4 Selma 4

$ cat rels.csv
start end type since counter:int
1 2 FATHER_OF 1998-07-10 1
1 3 FATHER_OF 2007-09-15 2
1 4 FATHER_OF 2008-05-03 3
3 4 SISTER_OF 2008-05-03 5
2 3 SISTER_OF 2007-09-15 7

Now, if you have to add more data to the same database then you will need to know following things:

1) if nodes already exists then what are their ids so that you can use them while creating a triplet, if not then create a list of such nodes (not in database) and then start from a id that has not been used in last import and use it as a starting id for creating a new nodes_index.csv

2) if a triplet in database already exist, then don't create that triplet again as it will result in a duplicate result when running cypher queries against the database.

It seems like same issue has been raised here as well: https://github.com/jexp/batch-import/issues/27

Thanks!

Mohamed Habib Mohamed Habib · Accepted Answer · 2013-03-26T20:22:52

1- why you need to know last node id .. you don't need to know the id to insert the new node it will added automatically in first free id in graph

2- for uniqueness , why you don't use create unique query "for nodes and relations as well"

here you can check the references : http://docs.neo4j.org/chunked/1.8/cypher-query-lang.html

multiple loads in neo4j

1 Answers