neo4j HA structure in embedded mode

Question

I want to have this structure in my application using neo4j database.

enter image description here

Here i am deploying my application on three different servers and each is having its own embedded neo4j database.

I want all databases are in sync automatically.

Is it fine for my big size data application?

I am using Spring data neo4j,how to configure this structure in SDN.

Do i need Enterprise edition of neo4j for this.

Any other framework/technology which can be used?

I have almost done the structure like this

My web application is deployed on

localhost:8088 xml-> ha.server_id = 1 and dbptah = E:/data1
localhost:8089 xml-> ha.server_id = 2 and dbptah = E:/data2
localhost:8090 xml-> ha.server_id = 3 and dbptah = E:/data3

Neo4j server is running on :

localhost:7474 properties-> ha.server_id = 1 and dbptah = E:/data1
localhost:7475 properties-> ha.server_id = 2 and dbptah = E:/data2
localhost:7476 properties-> ha.server_id = 3 and dbptah = E:/data3

now when i run my web application it gives me error :

Caused by StoreLockException: Unable to obtain lock on store lock file:

It seems that it can not get access to directory which is being used by neo4j server then how i confiigure xml in my web app??

Are you running standalone servers too? If you are, you should stop them, Neo4J embedded is a fully functional server embedded in your java application. — JohnMark13
@JohnMark13 but i want clustering,then for clustering i think i have to run standalone servers. — Ankit Gupta
No you do not. That was what my first comment on my answer below said too. I am running an HA cluster using SDN and no standalone instances. You can mix both. Whatever you are doing though, you should not have two instances referencing the same db directory. — JohnMark13
So shut down the standalone servers and you can just use your web application instances if it uses the config I provided. — JohnMark13
@JohnMark13 i stopped,now let's say there is no neo4j server running on my machine,and i have three instances of my web application and each is having different server_id and DBPATH.is it ok?nut now my web applications are not getting up,it seems they are waiting for initial hosts or something,no error just waiting.. — Ankit Gupta

JohnMark13 JohnMark13 · Accepted Answer · 2014-10-18T18:16:19

In answer to your first question, yes running Neo4j-enterprise (HA) with SDN is great for a project with large aounts of data sitting behind a load balancer. There is one issue that I am aware of and that is that SDN has not been developed to understand HA. This has caused one issue for me in that certain operations can only be performed (and I mean only) on the master node, so you have to code arround that.

SDN with HA is easy to configure once you've found out how! This is the code I use and was originally borrowed from Stefan Armbruster.

<util:map id="config">
   <entry key="ha.server_id" value="1"/>
   <entry key="ha.initial_hosts" value="thisserver.com:5001,yoursecondserver.com:5001,yourthirdserver.com:5001"/>
</util:map>

<bean id="graphDbFactory" class="org.neo4j.graphdb.factory.HighlyAvailableGraphDatabaseFactory"/>

<bean id="graphDbBuilder" factory-bean="graphDbFactory" factory-method="newHighlyAvailableDatabaseBuilder">
     <constructor-arg value="/tmp/neo4j"/>
</bean>

<bean id="graphDbBuilderFinal" factory-bean="graphDbBuilder" factory-method="setConfig">
    <constructor-arg ref="config"/>
</bean>

<bean id="graphDatabaseService" factory-bean="graphDbBuilderFinal" factory-method="newGraphDatabase" destroy-method="shutdown" />

In the map at the top you can put any of the settings that you might find in the documentation here and here. Once your cluster is running you can drop new nodes in, the requirement is that the ha.server_id value is unique, but initial_hosts does not need to be fully populated. Your life will be more simple if you set values for the ha.server and ha.cluster_server properties.

For performance you will want to try to minimise writes on your Slave nodes, there is some excellent documentation on how to achieve this on teh Neo Site, essentially configuring your load balancer to make routing decisions based on the responses from the special HA endpoints.

/db/manage/server/ha/master
/db/manage/server/ha/slave
/db/manage/server/ha/available

Unfortunately I could not use that, so for non critical writes I post them off to a message queue which is only processed on the Master node.

HA will automatically keep your data in sync and will enable live backups too, elections will be performened when the Master dies to promte another cluster member. Since Neo moved away from ZooKepper it is trivial to setup.

Now, the tough part. Licensing. HA is enterprise and as far as I understand the application of the APGL, you need to either buy a license or open source your code. There is another license model on the site personal which will allow you and a small team (+2 people) make software which can take up to $100000 but you'll be left with just SO for support...

Any other frameworks, probably, but not that I am familiar with - Graphene would probably be a good place to start investigating though if you want to offload the scaling responsibility.

neo4j HA structure in embedded mode

1 Answers