I have loaded some data in neo4j graph database using batch importer. Now let's say if I have to load more data then do i have to keep track of what was inserted externally or there are standard features of neo4j that can be used to:
1) get the id for the last node inserted so that i know the id for the new node that needs to be inserted and index accordingly.
2) get the list of nodes already present in database so that i can check the uniqueness of the nodes that are going to be inserted. if a node already exists in the database i will just use the same id and won't create a new node.
3) check the uniqueness of the triplets - suppose a triplet "January Month is_a" is already present in neo4j database and let's say the new data that i want to insert also have same triplet, i would like to not insert it as it will give me duplicate results.
For example: if you add following data in neo4j graph database using batch-importer:https://github.com/jexp/batch-import
$ cat nodes.csv
name age works_on
Michael 37 neo4j
Selina 14
Rana 6
Selma 4
$ cat nodes_index.csv
0 name age works_on
1 Michael 37 neo4j
2 Selina 14
3 Rana 6
4 Selma 4
$ cat rels.csv
start end type since counter:int
1 2 FATHER_OF 1998-07-10 1
1 3 FATHER_OF 2007-09-15 2
1 4 FATHER_OF 2008-05-03 3
3 4 SISTER_OF 2008-05-03 5
2 3 SISTER_OF 2007-09-15 7
Now, if you have to add more data to the same database then you will need to know following things:
1) if nodes already exists then what are their ids so that you can use them while creating a triplet, if not then create a list of such nodes (not in database) and then start from a id that has not been used in last import and use it as a starting id for creating a new nodes_index.csv
2) if a triplet in database already exist, then don't create that triplet again as it will result in a duplicate result when running cypher queries against the database.
It seems like same issue has been raised here as well: https://github.com/jexp/batch-import/issues/27
Thanks!