1
votes

I'm using BatchInserter to initialise my Neo4j database - the data is coming from XML files on my local filesystem.

Suppose one set of files contains node information / properties, and another set has relationship information. I wanted to do two passes: create all the nodes, then set about creating the relationships.

However, the createRelationship method accepts a long id for the nodes, which I don't have in my relationship XML - all of my nodes have a GUID as a property called ID which I use to reference them.

Does BatchInsert mean it hasn't been indexed yet, so I won't be able to create relationships on nodes based on some other property?

2
I know that with the neo4j-import command (which I think uses BatchInsert) you won't have any indexes and you need to create them after all of the data is loaded. But you should still be able to create relationships on properties. It just might be slow until you index them (and remember that if you add an index it might have to process for a bit before it comes ONLINE. See the :schema command in the web console)Brian Underwood

2 Answers

1
votes

I usually just keep the node-attribute to id mapping in a cache in memory in an efficient collection implementation like Trove or so.

Then for the relationships you can look up the node-id by attribute.

0
votes

I found I was able to add nodes to the index as I go.

Creating index:

BatchInserter inserter = BatchInserters.inserter( "data/folder" );
BatchInserterIndexProvider indexProvider = new LuceneBatchInserterIndexProvider( inserter );
BatchInserterIndex index = indexProvider.nodeIndex("myindex", MapUtil.stringMap( "type", "exact" ) );

Then each time I insert a node, add it to the index as well:

 Label label = DynamicLabel.label("person");
 Map<String, Object> properties = new HashMap<>();
 properties.put("ID", <some-value-here>);
 long newNode = inserter.createNode(properties, labek);
 index.add(newNode, properties);
 index.flush();

Which I can query as I like:

 IndexHits<Long> hits = index.get("ID", <some-value-here>);
 if(hits.size() > 0) {
    long existing = hits.getSingle();
 }

I have no idea whether this is any good. I guess calling flush on the index often is a bad idea, but it seems to work for me.