Using Rexster and Titan Graph DB for scalable applications

Question

I have a python application communicating with Titan graph database backed by Cassandra.

Python App ---------> Rexster Server + Titan Graph DB + Cassandra.

The "Rexster Server + Titan Graph DB + Cassandra" is inside a single JVM.

My python application runs on multiple Virtual machines.i.e Each virtual machine has an identical copy of my application. The idea is to make the application scalable. Now clearly for the initial implementation I am using a single instance of "Rexster Server + Titan Graph DB + Cassandra". This means that the backend database is a single node. My applications running on different virtual machines talk to the same backend.

My questions are as follows.

1) I want to make the backend database scalable as well. How can I do this?

2) Do I need to use the same "Rexster + Titan Graph DB" and configure multiple cassandra nodes?

3) Is Titan Graph DB the best option for this use case? Or can I substitute Titan Graph DB with Neo4j and Rexster with Neo4jserver. why/whynot?

gnomeria gnomeria · Accepted Answer · 2015-03-01T03:15:21

Titan is a highly scalable graph database as has been demonstrated in their examples. To answer your questions, I think it's necessary to express how big is your project could be. If you intend to deploy a hadoop cluster, make sure the rexster is configured to connect to the Zookeeper address of the backend (if managed by it) and not a list of addresses of the nodes.

1. I want to make the backend database scalable as well. How can I do this?
If you intend to scale beyond the confine of one machine, you could refer to this page for more info : Titan-Cassandra Configuration. As to whether how to make the backend database to be scalable, Cassandra and HBase are very scalable databases and I suggest you read more about Hadoop ecosystem to understand how Titan DB fits into this. You could have many HBase/Cassandra nodes that rexster could talk to

2. Do I need to use the same "Rexster + Titan Graph DB" and configure multiple cassandra nodes?
You could start several rexster servers on a different machine in the cluster, with each connecting to the same backend. But each graph from the rexster is independent of each other, so you have to manually partition your graph operations. And in this scenario, it only good for a high number of users instead of deep-traversals/queries

3. Is Titan Graph DB the best option for this use case? Or can I substitute Titan Graph DB with Neo4j and Rexster with Neo4jserver. why/whynot? Because it seems you're going to deploy a cluster, I think Titan is the better choice unless you're willing to pay for the Enterprise edition of Neo4j to support clustering. Neo4j editions Another point to consider : Titan vs OrientDB

Using Rexster and Titan Graph DB for scalable applications

1 Answers